主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2020, Vol. 28 ›› Issue (9): 45-53.doi: 10.16381/j.cnki.issn1003-207x.2018.1491

• 论文 • 上一篇    下一篇

基于改进教学式方法的可解释信用风险评价模型构建

董路安, 叶鑫   

  1. 大连理工大学经济管理学院, 辽宁 大连 116024
  • 收稿日期:2018-10-18 修回日期:2019-03-20 出版日期:2020-09-20 发布日期:2020-09-25
  • 通讯作者: 叶鑫(1977-),男(汉族),辽宁辽阳人,大连理工大学经济管理学院,教授,博士,研究方向:复杂信息系统建模与优化、大数据等,E-mail:yexin@dlut.edu.cn E-mail:yexin@dlut.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(71533001,71731003)

Interpretable Credit Risk Assessment Modeling Based on Improved Pedagogical Method

DONG Lu-an, YE Xin   

  1. School of Economics and Management, Dalian University of Technology, Dalian 116024, China
  • Received:2018-10-18 Revised:2019-03-20 Online:2020-09-20 Published:2020-09-25

摘要: 信用风险评价是金融机构风险防控的重要环节之一。近年来,基于机器学习的信用风险评价模型以其准确的预测效果受到越来越多的关注,但机器学习模型具有可解释性不强的弊端,导致投资者无法完全信任其预测结果。针对上述问题,本文提出了一种改进的教学式方法,利用机器学习模型指导生成一个兼顾准确性与可解释性的信用风险评价决策树模型,以辅助投资者决策。为提高决策树对机器学习模型中正确功能的学习能力,提出了基于Weight Synthetic Minority Over-sampling Technique(Weight-SMOTE)的伪数据集生成方法,以提高伪数据集中可信度高的功能所标记的伪样本比例;为实现所生成的决策树在准确性、可解释性以及其与机器学习模型一致性间的有效权衡,在决策树生成过程中提出了一种新的决策树剪枝方法;同时针对保真度评价指标的局限性,提出了真保真度评价指标,来有效的衡量决策树与机器学习模型正确功能的近似程度。最后使用3个真实信用风险评价数据集对改进的教学式方法进行验证,实验结果表明所提出方法能够生成准确且可解释的信用风险评价模型,以满足投资者的决策偏好与实际需求。

关键词: 信用风险评价, 机器学习模型, 可解释性, 教学式方法

Abstract: Credit risk assessment is one of the important tasks in risk prevention and control of financial institutions. In recent year, the credit risk assessment based on machine learning models have received increasing attention due to their better predictive performance. However, the machine learning models lack the interpretability, which makes it impossible for decision makers to fully trust the models and their predictive results. To solve the above problem, an improved pedagogical method is proposed. The proposed method uses the machine learning models guide to construct the credit risk assessment based on a decision tree, which can assist investors in decision-making. In our approach, to improve the approximation degree between decision tree and the correct function of machine learning model, a pseudo date set sampling method based on Weight Synthetic Minority Over-sampling Technique (Weight-SMOTE) is proposed to improve the proportion of pseudo samples, which is marked by the trusted function of machine learning model, in the pseudo data set. To achieve an effective trade-off between accuracy, interpretability and the consistency of generated decision tree with machine learning model, a new decision tree pruning method is proposed. At the same time, aiming at the limitation of fidelity evaluation indicator, a new evaluation indicator, true-fidelity, is proposed to effectively measure the approximation degree of the decision tree and the correct function of the machine learning model. In addition, three real credit risk assessment data sets are used in the experiment. The experimental results show that the improved pedagogical method can construct an interpretable credit risk assessment, which can meet the different decision-making preferences of decision makers.

Key words: credit risk assessment, machine learning model, interpretability, pedagogical method

中图分类号: