主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2017, Vol. 25 ›› Issue (11): 12-21.doi: 10.16381/j.cnki.issn1003-207x.2017.11.002

• 论文 • 上一篇    下一篇

融入软信息的P2P网络借贷违约预测方法

蒋翠清, 王睿雅, 丁勇   

  1. 合肥工业大学管理学院, 安徽 合肥 230009
  • 收稿日期:2016-07-06 修回日期:2017-04-20 出版日期:2017-11-20 发布日期:2018-01-31
  • 通讯作者: 王睿雅(1992-),女(汉族),安徽合肥人,合肥工业大学管理学院,硕士研究生,研究方位:大数据分析、信用评价,E-mail:wrylr@163.com E-mail:wrylr@163.com
  • 基金资助:

    国家自然科学基金资助项目(71731005,71571059)

The Default Prediction Combined with Soft Informationin Online Peer-to-Peer Lending

JIANG Cui-qing, WANG Rui-ya, DIGN Yong   

  1. School of Management, Hefei University of Technology, Hefei 230009,China
  • Received:2016-07-06 Revised:2017-04-20 Online:2017-11-20 Published:2018-01-31

摘要: 在P2P网络借贷中,预测借款的违约概率是用户信用评价的关键,也是借贷平台与投资者关注的重点问题。由于P2P平台所获取的用户财务信息有限,P2P借款信用评价和违约预测面临新的挑战。本文结合P2P平台的信息特点,提出一种融入软信息的网络借款违约预测方法。首先利用主题模型抽取并量化文本软信息中的相关变量,进而分析不同软信息变量对借款违约的影响关系;其次,设计了一种两阶段的变量选择方法对软硬信息进行组合筛选;最后,引入随机森林算法构建融入软信息的违约预测模型,并结合P2P平台的真实数据进行实证分析。结果表明,在P2P借款的违约预测模型中融入有价值的软信息可以提高预测准确率。

关键词: P2P借贷, 违约预测, 软信息, 主题模型, 变量选择, 随机森林

Abstract: P2P lending is a new type of loan mode formed by the intersection of Internet and traditional finance. It provides a more convenient loan platform and has been developing rapidly in China.However, the phenomenon of collapse in P2P is getting worse as P2P loans is facing default risk and bad debt losses seriously. Credit evaluation is an important basis for managing loan default risk and supporting lending decision. Compared with traditional loans, the financial data of borrowers collected by P2P platform is limited, which is also called the hard the information.However,there is lots of soft information generated during the loan application, such as loan description text,also involving some information about loans and borrowers. Therefore, a default prediction method combined with soft informationfor P2P lending is proposed. Firstly, the soft information is categorized according to the characteristics of P2P, and the LDA topic model is used to quantify valuable factors in the text of soft information. Secondly, some regression analysis and contrast experiments are performed to test the effect of soft information on P2P default probability. Moreover, a two-stage method is designed to selecteffective variablesets for default modeling, and the default prediction model is constructed through the random forest (RF) method.Finally, based on the data from a Chinese P2P platform—eloan.com, an experimental research is conducted to verify the effectiveness of methods we proposed.The results show that the soft information can improve the recognition rate of loan default, which can be used as the basis of P2P credit evaluation. The feature combination selection method proposed in this paper and the credit evaluation model based on Random Forest have achieved good classification accuracy.And the proposed method can improve predictionperformancesobviously compared withthe platform's own rating method, which has certain reference significance for the credit evaluation of P2P network lending.

Key words: P2P lending, default prediction, soft information, topic model, variable selection, RF model

中图分类号: