基于改进教学式方法的可解释信用风险评价模型构建

doi:10.16381/j.cnki.issn1003-207x.2018.1491

中国管理科学 ›› 2020, Vol. 28 ›› Issue (9): 45-53.doi: 10.16381/j.cnki.issn1003-207x.2018.1491

基于改进教学式方法的可解释信用风险评价模型构建

董路安, 叶鑫

大连理工大学经济管理学院, 辽宁大连 116024

收稿日期:2018-10-18 修回日期:2019-03-20 出版日期:2020-09-20 发布日期:2020-09-25
通讯作者: 叶鑫(1977-),男(汉族),辽宁辽阳人,大连理工大学经济管理学院,教授,博士,研究方向:复杂信息系统建模与优化、大数据等,E-mail:yexin@dlut.edu.cn E-mail:yexin@dlut.edu.cn
基金资助:
国家自然科学基金资助项目（71533001，71731003）

Interpretable Credit Risk Assessment Modeling Based on Improved Pedagogical Method

DONG Lu-an, YE Xin

School of Economics and Management, Dalian University of Technology, Dalian 116024, China

Received:2018-10-18 Revised:2019-03-20 Online:2020-09-20 Published:2020-09-25

摘要/Abstract

摘要： 信用风险评价是金融机构风险防控的重要环节之一。近年来，基于机器学习的信用风险评价模型以其准确的预测效果受到越来越多的关注，但机器学习模型具有可解释性不强的弊端，导致投资者无法完全信任其预测结果。针对上述问题，本文提出了一种改进的教学式方法，利用机器学习模型指导生成一个兼顾准确性与可解释性的信用风险评价决策树模型，以辅助投资者决策。为提高决策树对机器学习模型中正确功能的学习能力，提出了基于Weight Synthetic Minority Over-sampling Technique（Weight-SMOTE）的伪数据集生成方法，以提高伪数据集中可信度高的功能所标记的伪样本比例；为实现所生成的决策树在准确性、可解释性以及其与机器学习模型一致性间的有效权衡，在决策树生成过程中提出了一种新的决策树剪枝方法；同时针对保真度评价指标的局限性，提出了真保真度评价指标，来有效的衡量决策树与机器学习模型正确功能的近似程度。最后使用3个真实信用风险评价数据集对改进的教学式方法进行验证，实验结果表明所提出方法能够生成准确且可解释的信用风险评价模型，以满足投资者的决策偏好与实际需求。

关键词: 信用风险评价, 机器学习模型, 可解释性, 教学式方法

Abstract: Credit risk assessment is one of the important tasks in risk prevention and control of financial institutions. In recent year, the credit risk assessment based on machine learning models have received increasing attention due to their better predictive performance. However, the machine learning models lack the interpretability, which makes it impossible for decision makers to fully trust the models and their predictive results. To solve the above problem, an improved pedagogical method is proposed. The proposed method uses the machine learning models guide to construct the credit risk assessment based on a decision tree, which can assist investors in decision-making. In our approach, to improve the approximation degree between decision tree and the correct function of machine learning model, a pseudo date set sampling method based on Weight Synthetic Minority Over-sampling Technique (Weight-SMOTE) is proposed to improve the proportion of pseudo samples, which is marked by the trusted function of machine learning model, in the pseudo data set. To achieve an effective trade-off between accuracy, interpretability and the consistency of generated decision tree with machine learning model, a new decision tree pruning method is proposed. At the same time, aiming at the limitation of fidelity evaluation indicator, a new evaluation indicator, true-fidelity, is proposed to effectively measure the approximation degree of the decision tree and the correct function of the machine learning model. In addition, three real credit risk assessment data sets are used in the experiment. The experimental results show that the improved pedagogical method can construct an interpretable credit risk assessment, which can meet the different decision-making preferences of decision makers.

Key words: credit risk assessment, machine learning model, interpretability, pedagogical method

中图分类号:

TP18

董路安, 叶鑫. 基于改进教学式方法的可解释信用风险评价模型构建[J]. 中国管理科学, 2020, 28(9): 45-53.

DONG Lu-an, YE Xin. Interpretable Credit Risk Assessment Modeling Based on Improved Pedagogical Method[J]. Chinese Journal of Management Science, 2020, 28(9): 45-53.

参考文献

[1] 张大斌,周志刚,许职,等.基于差分进化自动聚类的信用风险评价模型研究[J].中国管理科学,2015,23(4):39-45.
[2] Malekipirbazari M, Aksakalli V. Risk assessment in social lending via random forests[J]. Expert Systems with Applications, 2015, 42(10):4621-4631.
[3] 王春峰, 万海晖, 张维. 基于神经网络技术的商业银行信用风险评估[J]. 系统工程理论与实践, 1999, 19(9):24-33.
[4] 衣柏衡, 朱建军, 李杰. 基于改进SMOTE的小额贷款公司客户信用风险非均衡SVM分类[J]. 中国管理科学, 2016, 24(3):24-30.
[5] Yu Lean, Yang Zebin, Tang Ling. A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment[J]. Flexible Services & Manufacturing Journal, 2016, 28(4):1-17.
[6] Baesens B, Setiono R, Mues C, et al. Using neural network rule extraction and decision tables for credit-risk evaluation[J]. Management Science, 2003, 49(3):312-329.
[7] Martens D, Baesens B, Gestel T V, et al. Comprehensible credit scoring models using rule extraction from support vector machines[J]. European Journal of Operational Research, 2007, 183(3):1466-1476.
[8] Ribeiro M T, Singh S, Guestrin C. "Why should I trust you?":Explaining the predictions of any classifier[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:1135-1144.
[9] Riccardo G, Anna M, Salvatore R, et al. A survey of methods for explaining black box models[J]. ACM Computing Surveys, 2018, 51(5):1-42.
[10] Capon N. Credit scoring systems:A critical analysis[J]. Journal of Marketing, 1982, 46(2):82-91.
[11] Craven M, Shavlik J W. Extracting treestructured representations of trained networks[C]//Advances in neural information processing systems. 1996:24-30.
[12] Craven M W. Extracting comprehensible models from trained neural networks[D].Wisconsin:University of Wisconsin-Madison Department of Computer Sciences, 1996.
[13] 侯文坤, 张劲峰. 基于决策树的神经网络规则抽取方法[J]. 中山大学学报(自然科学版), 2000, 39(4):27-30.
[14] Schmitz G P, Aldrich C, Gouws F S, et al. ANN-DT:An algorithm for extraction of decision trees from artificial neural networks[J]. IEEE Transactions on Neural Networks, 1999, 10(6):1392-1401.
[15] Wu M, Hughes M C, Parbhoo S, et al. Beyond sparsity:Tree regularization of deep models for interpretability[C]//the Association for the Advance of Artificial Intelligence, 2018:1670-1678.
[16] Huysmans J, Baesens B, Vanthienen J. Using rule extraction to improve the comprehensi-bility of predictive models[J]. Social Science Electronic Publishing, 2007:1-55.
[17] 陆爱国,王珏,刘红卫.基于改进的SVM学习算法及其在信用评分中的应用[J].系统工程理论与实践,2012,32(3):515-521.
[18] 于晓虹, 楼文高. 基于随机森林的P2P网贷信用风险评价、预警与实证研究[J]. 金融理论与实践, 2016(2):53-58.
[19] 夏江南, 王杜娟, 王延章,等. 基于结构自适应模糊神经网络的前列腺癌诊断方法[J]. 系统工程理论与实践, 2018, 38(5):1331-1342.
[20] 苏博, 刘鲁, 杨方廷. 基于灰色关联分析的神经网络模型[J]. 系统工程理论与实践, 2008, 28(9):98-104.
[21] Deng H, Runger G, Tuv E, et al. CBC:An associative classifier with a small number of rules[J]. Decision Support Systems, 2014, 59:163-170.
[22] Gras R, Mashayekhi M. Rule extraction from decision trees ensembles:New algorithms based on heuristic search and sparse group lasso methods[J]. International Journal of Information Technology & Decision Making, 2017, 16(06):1707-1727.
[23] Liu Sheng, Patel R Y, Daga P R, et al. Combined rule extraction and feature elimination in supervised classification[J]. IEEE Transactions on Nanobioscience, 2012, 11(3):228-236.
[24] Ye Xin, Liu Sihao, Yin Yanli, et al. User-oriented many-objective cloud workflow scheduling based on an improved knee point driven evolutionary algorithm[J]. Knowledge-Based Systems, 2017, 135:113-124.

基于改进教学式方法的可解释信用风险评价模型构建

Interpretable Credit Risk Assessment Modeling Based on Improved Pedagogical Method

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

Metrics

本文评价

推荐阅读 0

[1]	冯易,王杜娟,胡知能,崔少泽. 基于改进LightGBM集成模型的胃癌存活性预测方法[J]. 中国管理科学, 2023, 31(10): 234-244.
[2]	程砚秋, 徐占东. 基于泰尔指数修正的ELECTRE III小企业信用评价模型[J]. 中国管理科学, 2019, 27(10): 22-33.