基于文本先验信息的贷款信用风险评估模型

doi:10.16381/j.cnki.issn1003-207x.2019.1554

中国管理科学 ›› 2021, Vol. 29 ›› Issue (5): 34-44.doi: 10.16381/j.cnki.issn1003-207x.2019.1554

基于文本先验信息的贷款信用风险评估模型

王小燕¹, 张中艳¹, 马双鸽²

1. 湖南大学金融与统计学院, 湖南长沙 410079;
2. 耶鲁大学公共卫生学院, 美国康州 06511

收稿日期:2019-10-09 修回日期:2020-01-22 出版日期:2021-05-20 发布日期:2021-05-26
通讯作者: 王小燕(1987-),女(汉族),湖南娄底人,湖南大学金融与统计学院,副教授,博士,研究方向:数据挖掘、高维数据分析,E-mail:xywang@hnu.edu.cn. E-mail:xywang@hnu.edu.cn.
基金资助:
国家自然科学基金资助项目（71601076，71701201）；教育部人文社科青年项目（16YJCZH104）；湖南省社会科学成果评审委员会项目（XSP21YBZ003）；江苏省自然科学基金青年资助项目（BK20170268）

A Loan Credit Risk Model Incorporating Text Prior Information

WANG Xiao-yan¹, ZHANG Zhong-yan¹, MA Shuang-ge²

1. College of Finance and Statistics, Hunan University, Changsha 410079, China;
2. School of Public Health, Yale University, CT 06511, USA

Received:2019-10-09 Revised:2020-01-22 Online:2021-05-20 Published:2021-05-26

摘要/Abstract

摘要： 本文利用现有信用风险研究所蕴含的信息，构建了一个新的贷款信用风险评估模型—PIPL。该模型先通过文本挖掘技术对现有研究进行文本信息搜集，得到关于信用风险指标的先验词频以体现指标的重要性，再通过惩罚变量选择法将先验词频量化为先验因变量，最后基于先验因变量和原始数据构建模型，并通过弹性网筛选风险指标。模拟分析显示PIPL模型能自动识别先验信息的质量，当先验信息质量高时，它赋予先验信息较高的权重，从而改进了指标选择和分类效果，当先验信息质量较低时，它能自动降低先验因变量在模型中的权重，表现出稳健的分类效果。实证分析从知网挖掘123篇文献获取文本信息，并以P2P网贷数据为例，发现PIPL模型通过先验信息提高了分类的准确性，并表现出了良好的稳健性。

关键词: 文本信息, Logistic回归, 弹性网, 贷款信用风险

Abstract: Loan is not only an important way to solve the shortage of finances, but also an important business of financial institutions. The management of loan credit risk is quite essential for the survival and development of those institutions. A key step to control the credit risk is to identify the factors having significant effect on the default. In the existing studies, there may be much valuable and important information which can benefit our study. To incorporate the information, a loan credit risk evaluation model named PIPL is constructed in this study. It first searches text prior information from existing literatures via text mining techniques, obtaining the prior frequency for the credit risk indexes which indicates their importance. Then a penalized variable selection method is used to transfer the prior frequency into a prior response, which realizes the transformation from qualitative information to its quantitative counterpart. Finally the loss function of the proposed model is constructed by weighting the prior response and original observations. To achieve risk index selection, an Elastic net method is adopted. To estimate the parameters, an iteratively reweighted least squares method and a coordinate descent algorithm are used.
Simulation study is developed to verify the validity of the proposed PIPL model. Especially, various types of prior information with different extend of quality are set in the simulation, which can examine the model's utilization of the good information and the robustness to the bad information. The result shows that PIPL model can adaptively adjust the quality of prior information. When the information is of high quality, PIPL improves the weight of prior information in the model and then enhances the model's performance in terms of index selection and classification. When it lacks reliability, PIPL can adaptively reduce the weight of prior response, presenting some extend of robustness on classification.
In the empirical analysis, 123 literatures about credit risk are mined from the CNKI. Taking P2P data from Lending Club as an example, the analysis shows that PIPL model can enhance the classification accuracy and present satisfactory robustness. Both simulation and empirical study show the reasonability of the new model. It may have some practicability in the financial risk management.

Key words: text information, Logistic regression, Elastic Net, loan credit risk

中图分类号:

F830.5

王小燕, 张中艳, 马双鸽. 基于文本先验信息的贷款信用风险评估模型[J]. 中国管理科学, 2021, 29(5): 34-44.

WANG Xiao-yan, ZHANG Zhong-yan, MA Shuang-ge. A Loan Credit Risk Model Incorporating Text Prior Information[J]. Chinese Journal of Management Science, 2021, 29(5): 34-44.

参考文献

[1] 董路安,叶鑫.基于改进教学式方法的可解释信用风险评价模型构建[J].中国管理科学,2020,28(9):45-53.
[2] 方匡南,吴见彬,朱建平,等.信贷信息不对称下的信用卡信用风险研究[J].经济研究,2010,45(S1):97-107.
[3] Danenas P,Garsva G. Selection of support vector machines based classifiers for credit risk domain[J]. Expert Systems with Applications, 2015, 42(6):3194-3204.
[4] 姚潇,余乐安.模糊近似支持向量机模型及其在信用风险评估中的应用[J].系统工程理论与实践,2012,32(3):549-554.
[5] 杨保安,季海.基于人工神经网络的商业银行贷款风险预警研究[J].系统工程理论与实践,2001,21(5):70-74.
[6] 吴德胜,梁樑.基于V-fold Cross-validation和Elman神经网络的信用评价研究[J].系统工程理论与实践,2004,24(4):92-98.
[7] 王福林,贾生华,邵海华.个人住房抵押贷款违约风险影响因素实证研究——以杭州市为例[J].经济学(季刊),2005,(2):739-752.
[8] 马宇.我国个人住房抵押贷款违约风险影响因素的实证研究[J].统计研究,2009,26(5):100-107.
[9] 胡毅,王珏,杨晓光.基于面板Logit模型的银行客户贷款违约风险预警研究[J].系统工程理论与实践,2015,35(7):1752-1759.
[10] 舒扬,杨秋怡.基于大样本数据模型的汽车贷款违约预测研究[J].管理评论,2017,29(9):59-71.
[11] 石庆焱.一个基于神经网络——Logistic回归的混合两阶段个人信用评分模型研究[J].统计研究,2005,22(5):45-49.
[12] 姜明辉,谢行恒,王树林,等.个人信用评估的Logistic-RBF组合模型[J].哈尔滨工业大学学报,2007,(7):1128-1130.
[13] 张奇,胡蓝艺,王珏.基于Logit与SVM的银行业信用风险预警模型研究[J].系统工程理论与实践,2015,35(7):1784-1790.
[14] 李电生,张腾飞.基于Logistic生长方程——马尔可夫模型的港口投资风险测度[J].中国管理科学,2018,26(3):33-42.
[15] 张金宝.基于完全信息的测算信用等级违约概率的新方法[J].数量经济技术经济研究,2018,35(6):149-164.
[16] Herzenstein M, Andrews R, Dholakia U, et al. The democratization of personal consumer loans? Determinants of success in online peer-to-peer lending communities[J]. Boston University School of Management Research Paper, 2008, 14(6):1-36.
[17] 杨红,陈德棉.个人住房抵押贷款违约相关变量选择[J].现代管理科学,2009,(4):100-101+104.
[18] 钱争鸣,李海波,于艳萍.个人住房按揭贷款违约风险研究[J].经济研究,2010,45(S1):143-152.
[19] 陈晓红,杨志慧.基于改进模糊综合评价法的信用评估体系研究——以我国中小上市公司为样本的实证研究[J].中国管理科学,2015,23(1):146-153.
[20] 方匡南,章贵军,张惠颖.基于Lasso-logistic模型的个人信用风险预警方法[J].数量经济技术经济研究,2014,31(2):125-136.
[21] 王小燕,方匡南,谢邦昌.Logistic回归的双层变量选择研究[J].统计研究,2014,31(9):107-112.
[22] 蒋翠侠,黄韵华,许启发.基于Lasso二元选择分位数回归的上市公司信用评估[J].系统工程,2017,35(2):16-24.
[23] 迟国泰,丁士杰.基于非预期损失控制的资产组合优化模型[J].数量经济技术经济研究,2018,35(3):150-167.
[24] Zou Hui, Hastie T. Regularization and variable selection via the elastic net[J]. Journal of the Royal Statistical Society:Series B (Statistical Methodology), 2005, 67(2):301-320.
[25] Jiang Yuan, He Yunxiao, Zhang Heping. Variable selection with prior information for generalized linear models via the prior lasso method[J]. Journal of the American Statistical Association, 2016, 111(513):355-376.
[26] 马晓君,沙靖岚,牛雪琪.基于LightGBM算法的P2P项目信用评级模型的设计及应用[J].数量经济技术经济研究,2018,35(5):144-160.
[27] 廖理,李梦然,王正位.聪明的投资者:非完全市场化利率与风险识别——来自P2P网络借贷的证据[J].经济研究,2014,49(7):125-137.
[28] 张卫国,卢媛媛,刘勇军.基于非均衡模糊近似支持向量机的P2P网贷借款人信用风险评估及应用[J].系统工程理论与实践,2018,38(10):2466-2478.
[29] Smilde A, Kiers H, Bijlsma S, et al. Matrix correlations for high-dimensional data:The modified RV-coefficient[J]. Bioinformatics, 2009, 25(3):401-405.

基于文本先验信息的贷款信用风险评估模型

A Loan Credit Risk Model Incorporating Text Prior Information

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

Metrics

本文评价

推荐阅读 10

[1]	肖进, 唐静, 刘敦虎, 谢玲, 汪寿阳. 基于改进GMDH的目标客户选择模型研究[J]. 中国管理科学, 2015, 23(10): 162-169.
[2]	张亮, 张玲玲, 陈懿冰, 腾伟丽. 基于信息融合的数据挖掘方法在公司财务预警中的应用[J]. 中国管理科学, 2015, 23(10): 170-176.