主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2021, Vol. 29 ›› Issue (5): 34-44.doi: 10.16381/j.cnki.issn1003-207x.2019.1554

• 论文 • 上一篇    下一篇

基于文本先验信息的贷款信用风险评估模型

王小燕1, 张中艳1, 马双鸽2   

  1. 1. 湖南大学金融与统计学院, 湖南 长沙 410079;
    2. 耶鲁大学公共卫生学院, 美国 康州 06511
  • 收稿日期:2019-10-09 修回日期:2020-01-22 出版日期:2021-05-20 发布日期:2021-05-26
  • 通讯作者: 王小燕(1987-),女(汉族),湖南娄底人,湖南大学金融与统计学院,副教授,博士,研究方向:数据挖掘、高维数据分析,E-mail:xywang@hnu.edu.cn. E-mail:xywang@hnu.edu.cn.
  • 基金资助:
    国家自然科学基金资助项目(71601076,71701201);教育部人文社科青年项目(16YJCZH104);湖南省社会科学成果评审委员会项目(XSP21YBZ003);江苏省自然科学基金青年资助项目(BK20170268)

A Loan Credit Risk Model Incorporating Text Prior Information

WANG Xiao-yan1, ZHANG Zhong-yan1, MA Shuang-ge2   

  1. 1. College of Finance and Statistics, Hunan University, Changsha 410079, China;
    2. School of Public Health, Yale University, CT 06511, USA
  • Received:2019-10-09 Revised:2020-01-22 Online:2021-05-20 Published:2021-05-26

摘要: 本文利用现有信用风险研究所蕴含的信息,构建了一个新的贷款信用风险评估模型—PIPL。该模型先通过文本挖掘技术对现有研究进行文本信息搜集,得到关于信用风险指标的先验词频以体现指标的重要性,再通过惩罚变量选择法将先验词频量化为先验因变量,最后基于先验因变量和原始数据构建模型,并通过弹性网筛选风险指标。模拟分析显示PIPL模型能自动识别先验信息的质量,当先验信息质量高时,它赋予先验信息较高的权重,从而改进了指标选择和分类效果,当先验信息质量较低时,它能自动降低先验因变量在模型中的权重,表现出稳健的分类效果。实证分析从知网挖掘123篇文献获取文本信息,并以P2P网贷数据为例,发现PIPL模型通过先验信息提高了分类的准确性,并表现出了良好的稳健性。

关键词: 文本信息, Logistic回归, 弹性网, 贷款信用风险

Abstract: Loan is not only an important way to solve the shortage of finances, but also an important business of financial institutions. The management of loan credit risk is quite essential for the survival and development of those institutions. A key step to control the credit risk is to identify the factors having significant effect on the default. In the existing studies, there may be much valuable and important information which can benefit our study. To incorporate the information, a loan credit risk evaluation model named PIPL is constructed in this study. It first searches text prior information from existing literatures via text mining techniques, obtaining the prior frequency for the credit risk indexes which indicates their importance. Then a penalized variable selection method is used to transfer the prior frequency into a prior response, which realizes the transformation from qualitative information to its quantitative counterpart. Finally the loss function of the proposed model is constructed by weighting the prior response and original observations. To achieve risk index selection, an Elastic net method is adopted. To estimate the parameters, an iteratively reweighted least squares method and a coordinate descent algorithm are used.
Simulation study is developed to verify the validity of the proposed PIPL model. Especially, various types of prior information with different extend of quality are set in the simulation, which can examine the model's utilization of the good information and the robustness to the bad information. The result shows that PIPL model can adaptively adjust the quality of prior information. When the information is of high quality, PIPL improves the weight of prior information in the model and then enhances the model's performance in terms of index selection and classification. When it lacks reliability, PIPL can adaptively reduce the weight of prior response, presenting some extend of robustness on classification.
In the empirical analysis, 123 literatures about credit risk are mined from the CNKI. Taking P2P data from Lending Club as an example, the analysis shows that PIPL model can enhance the classification accuracy and present satisfactory robustness. Both simulation and empirical study show the reasonability of the new model. It may have some practicability in the financial risk management.

Key words: text information, Logistic regression, Elastic Net, loan credit risk

中图分类号: