主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

Chinese Journal of Management Science ›› 2022, Vol. 30 ›› Issue (12): 211-221.doi: 10.16381/j.cnki.issn1003-207x.2020.0933

• Articles • Previous Articles    

Metacost Based Semi-supervised Heterogeneous Ensemble Model for Customer Credit Scoring

YAN Lan1, LI Si-han2, XIAO Yi3, KOU Yu-xuan4, LIU Dun-hu5, XIAO Jin2   

  1. 1. Intensive Language Training Center, Sichuan University, Chengdu 610064, China;2. Business School, Sichuan University, Chengdu 610064, China;3. School of Information Management, Central China Normal University, Wuhan 430079, China;4. School of Business, Macau University of Science and Technology, Macau 999078, China;5. School of Management, Chengdu University of Information Technology, Chengdu 610225, China
  • Received:2020-05-20 Revised:2020-07-03 Published:2023-01-10
  • Contact: 肖进 E-mail:xjxiaojin@126.com

Abstract: With the popularization of the credit business, effective risk aversion is one of the main means to maintain stable profits in the financial industry, and credit risk is one of the most common and important risk types in the financial industry. Therefore, accurate credit scoring of customers is very important. However, the class distribution of customer data used for credit-scoring models is often highly imbalanced, which means that there are significantly more customers with good credit as compared to customers with bad credit, and only a few customers who have successfully obtained loans can be labeled according to their future behavior, many customers who have applied for loans but failed to obtain them cannot be labeled. These characteristics bring great challenges to the establishment of scientific and accurate customer credit-scoring models, and existing researches cannot solve the above problems well. To make up for the lack of existing researches, meta cost-sensitive learning, semi-supervised learning, and heterogeneous ensemble learning are combined, and a Metacost based semi-supervised heterogeneous ensemble model (Meta-Semi-HE) is proposed for customer credit scoring. This model includes the following three stages: 1) Metacost is used to modify the initial labeled training set to obtain Lm; 2) N heterogeneous classifiers hi(i=1,…, N) are trained on Lm by AdaBoost, concomitant ensemble Hi is used to selectively mark samples of unlabeled data set, and adds them into Lm, N heterogeneous classifiers are retrained with the new Lm. Repeat this step to improve the performance of the member classifiers until the termination condition is satisfied; 3) the final trained classifiers are used to classify samples of the test set. The empirical analysis is conducted in six customer credit-scoring datasets, and the results show that the Meta-Semi-HE has better customer credit-scoring performance than the other five models in the evaluation criteria of AUC, f, Type I accuracy, and Type II accuracy. A new way of thinking for banks’ customer credit-scoring modeling is provided, which helps banks to avoid risks more effectively and promotes the healthy and stable development of credit business in the financial industry.

Key words: customer credit scoring; imbalanced class distribution; cost-sensitive learning; semi-supervised; heterogeneous ensemble

CLC Number: