主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2022, Vol. 30 ›› Issue (12): 211-221.doi: 10.16381/j.cnki.issn1003-207x.2020.0933

• 论文 • 上一篇    

基于Metacost的客户信用评估半监督异构集成模型研究

鄢澜1, 李思涵2, 肖毅3, 寇宇轩4, 刘敦虎5, 肖进2   

  1. 1.四川大学出国留学人员培训部,四川 成都610064; 2.四川大学商学院,四川 成都610064;3.华中师范大学信息管理学院,湖北 武汉430079; 4.澳门科技大学商学院,澳门特别行政区999078;5.成都信息工程大学管理学院, 四川 成都610225
  • 收稿日期:2020-05-20 修回日期:2020-07-03 发布日期:2023-01-10
  • 通讯作者: 肖进(1983-),男(汉族),四川广安人,四川大学商学院,研究员,管理学博士,中国科学院数学与系统科学研究院博士后,研究方向:智能决策、商务智能、客户关系管理,Email:xjxiaojin@126.com. E-mail:xjxiaojin@126.com
  • 基金资助:
    国家自然科学基金资助面上项目(72171160,71974020);四川省杰出青年基金资助项目(2020JDJQ0021);四川省“天府万人计划项目”(0082204151153);四川省软科学研究计划项目(2020JDR0120);四川大学国家领军人才培育基金资助项目(sksyl2021-03)

Metacost Based Semi-supervised Heterogeneous Ensemble Model for Customer Credit Scoring

YAN Lan1, LI Si-han2, XIAO Yi3, KOU Yu-xuan4, LIU Dun-hu5, XIAO Jin2   

  1. 1. Intensive Language Training Center, Sichuan University, Chengdu 610064, China;2. Business School, Sichuan University, Chengdu 610064, China;3. School of Information Management, Central China Normal University, Wuhan 430079, China;4. School of Business, Macau University of Science and Technology, Macau 999078, China;5. School of Management, Chengdu University of Information Technology, Chengdu 610225, China
  • Received:2020-05-20 Revised:2020-07-03 Published:2023-01-10
  • Contact: 肖进 E-mail:xjxiaojin@126.com

摘要: 针对现实中信用评估存在的问题,本研究将元代价敏感学习、半监督学习和异构集成等技术结合,提出了基于Metacost的客户信用评估半监督异构集成模型(Metacost based semi-supervised heterogeneous ensemble model, Meta-Semi-HE)。该模型主要包括三个阶段:1)用Metacost方法修改初始有标签训练集得到Lm;2)在Lm上通过AdaBoost方法训练N个异构分类器hi(i = 1,…, N),用伴随分类器组合Hi选择性标记无标签数据集的样本,并将其添加到Lm中,用新的Lm重新训练N个异构分类器。重复这一步骤,不断提高分类器性能,直至满足终止条件;3)用最终的N个异构分类器对测试集样本分类。在6个客户信用评估数据集上进行实证分析,结果表明,与已有的3种半监督集成模型和2种监督式集成模型相比,本研究提出的模型具有更好的客户信用评估性能。

关键词: 客户信用评估;类别分布不平衡;代价敏感学习;半监督;异构集成

Abstract: With the popularization of the credit business, effective risk aversion is one of the main means to maintain stable profits in the financial industry, and credit risk is one of the most common and important risk types in the financial industry. Therefore, accurate credit scoring of customers is very important. However, the class distribution of customer data used for credit-scoring models is often highly imbalanced, which means that there are significantly more customers with good credit as compared to customers with bad credit, and only a few customers who have successfully obtained loans can be labeled according to their future behavior, many customers who have applied for loans but failed to obtain them cannot be labeled. These characteristics bring great challenges to the establishment of scientific and accurate customer credit-scoring models, and existing researches cannot solve the above problems well. To make up for the lack of existing researches, meta cost-sensitive learning, semi-supervised learning, and heterogeneous ensemble learning are combined, and a Metacost based semi-supervised heterogeneous ensemble model (Meta-Semi-HE) is proposed for customer credit scoring. This model includes the following three stages: 1) Metacost is used to modify the initial labeled training set to obtain Lm; 2) N heterogeneous classifiers hi(i=1,…, N) are trained on Lm by AdaBoost, concomitant ensemble Hi is used to selectively mark samples of unlabeled data set, and adds them into Lm, N heterogeneous classifiers are retrained with the new Lm. Repeat this step to improve the performance of the member classifiers until the termination condition is satisfied; 3) the final trained classifiers are used to classify samples of the test set. The empirical analysis is conducted in six customer credit-scoring datasets, and the results show that the Meta-Semi-HE has better customer credit-scoring performance than the other five models in the evaluation criteria of AUC, f, Type I accuracy, and Type II accuracy. A new way of thinking for banks’ customer credit-scoring modeling is provided, which helps banks to avoid risks more effectively and promotes the healthy and stable development of credit business in the financial industry.

Key words: customer credit scoring; imbalanced class distribution; cost-sensitive learning; semi-supervised; heterogeneous ensemble

中图分类号: