客户信用评估半监督协同训练模型研究

肖进, 薛书田, 黄静, 谢玲, 顾新

doi:10.16381/j.cnki.issn1003-207x.2016.06.015

中国管理科学 >

2016 , Vol. 24 >Issue 6: 124 - 131

DOI: https://doi.org/10.16381/j.cnki.issn1003-207x.2016.06.015

论文

客户信用评估半监督协同训练模型研究

展开

1. 四川大学商学院, 四川成都 610064;
2. 四川大学公共管理学院, 四川成都 610064;
3. 四川大学软科学研究所, 四川成都 610064

收稿日期: 2015-02-13

修回日期: 2015-05-12

网络出版日期: 2016-07-05

基金资助

国家自然科学基金资助项目(71471124,71571126);四川省青年基金(2015RZ0056);四川省社科规划项目(SC14C019);四川大学优秀青年基金项目(2013SCU04A08);四川大学哲学社会科学青年学术人才基金(skqx201607);四川省教育厅创新团队资助项目(13TD0040)

收起

A Semi-Supervised Co-Training Model for Customer Credit Scoring

Expand

1. Business School, Sichuan University, Chengdu 610064, China;
2. School of Public Administration of Sichuan University, Chengdu 610064, China;
3. Soft Science Institute of Sichuan University, Chengdu, 610064, China

Received date: 2015-02-13

Revised date: 2015-05-12

Online published: 2016-07-05

Fold

摘要

在现实的很多信用评估问题中,由于对样本进行类别标记需要花费大量的人力、财力和物力,往往只能获取少量有类别标签的样本来训练分类模型,而把数据库中大量无类别标签的客户样本舍弃。为解决这一问题,本研究引入半监督学习技术,并将其与多分类器集成技术中的随机子空间方法(Random Subspace, RSS)相结合,构建了类别不平衡环境下基于RSS的半监督协同训练模型RSSCI。该模型主要包括三个阶段:1)使用RSS方法训练得到若干基本分类器;2)从大量无类别标签数据集中选择性标记一部分最合适的样本加入到原始训练集中;3)在最终的训练集上训练分类模型,并对测试集样本进行分类。在三个客户信用评估数据集上进行实证分析,结果表明,RSSCI模型的信用评估性能不仅优于常用的监督式集成信用评估模型,也优于已有的一些半监督协同训练信用评估模型。

关键词： 信用评估; 类别分布不平衡; 半监督; 协同训练; RSS

本文引用格式

肖进, 薛书田, 黄静, 谢玲, 顾新 . 客户信用评估半监督协同训练模型研究[J]. 中国管理科学, 2016 , 24(6) : 124 -131 . DOI: 10.16381/j.cnki.issn1003-207x.2016.06.015

Abstract

Customer credit scoring is one of the most important issues in customer relationship management (CRM). In some real credit scoring issues, many customer samples without class labels are abandoned and just only a few samples with class labels can be used to train the classification models, because it costs a lot of manpower, financial and material resources for labeling the samples. Furthermore, single classification model is difficult to achieve the accurate classification of the whole sample space as the current customer credit scoring problem with class imbalance characteristic. To solve the two problems, semi-supervised learning is introduced and combined with random subspace (RSS) in multiple classifiers ensemble, and then RSS is proposed based semi-supervised co-training model for class imbalance, RSSCI. This model includes the following three phases: 1) Obtains many base classifiers by RSS; 2) Labels some most appropriate samples in U which obtains lots of samples without class labels. Firstly, 3 base classifiers with the best performance are selected to classify the samples in U, the samples with the same forecasted class are put into the candidate set, and then the label confidence of each sample is calculated. Considering the class imbalance of the training data, the candidate are divided set into the positive and negative subsets, and the samples with higher confidence are selected from the two subsets according to the ratio of two classes in the original training set and added the original training set; 3) Trains the classification model in the final training set, and classifies the test set. Empirical analysis is conducted in three credit scoring datasets (German, Australia, UK-thomas, all of them are imbalanced data sets of a type distribution ; moreover, German and Australia are from the UCI international public database) , and the results show that the performance of RSSCI model is superior to the common used supervised ensemble credit scoring models and some existing semi-supervised CO-training credit scoring models, demonstrating the superiority of the RSSCI model of selective mechanism of labeling samples. In CRM, there are a lot of customer classification problems, such as customer churn prediction, customer targeting, which are similar to customer credit scoring. Thus, the model proposed in this study can also be used to solve the above problems, and thus is expected to achieve satisfaction classification performance.

Key words： credit scoring; class imbalance; semi-supervised; co-training; RSS

参考文献

[1] Orgler Y E. A credit scoring model for commercial loans[J]. Journal of Money, Credit and Banking, 1970, 2(4):435-445.

[2] 于立勇. 商业银行信用风险评估预测模型研究[J]. 管理科学学报, 2003, 6(5):46-52.

[3] 王春峰, 万海晖. 基于神经网络技术的商业银行信用风险评估[J]. 系统工程理论与实践, 1999, 19(9):24-32.

[4] Premachandra I M, Bhabra G S, Sueyoshi T. DEA as a tool for bankruptcy assessment:A comparative study with logistic regression technique[J]. European Journal of Operational Research, 2009, 193(2):412-424.

[5] 李旭升, 郭春香, 郭耀煌. 扩展的树增强朴素贝叶斯网络信用评估模型[J]. 系统工程理论与实践, 2008, 28(6):129-136.

[6] Laha A. Building contextual classifiers by integrating fuzzy rule based classification technique and k-nn method for credit scoring[J]. Advanced Engineering Informatics, 2007, 21(3):281-291.

[7] 刘京礼, 李建平, 徐伟宣, 等. 信用评估中的鲁棒赋权自适应L_p最小二乘支持向量机方法[J]. 中国管理科学, 2010, 18(5):28-33.

[8] 姚潇, 余乐安. 模糊近似支持向量机模型及其在信用风险评估中的应用[J]. 系统工程理论与实践, 2012, 32(3):549-554.

[9] 吴冲, 夏晗. 基于支持向量机集成的电子商务环境下客户信用评估模型研究[J]. 中国管理科学, 2008, 16(S1):368-373.

[10] 王春峰, 康莉. 基于遗传规划方法的商业银行信用风险评估模型[J]. 系统工程理论与实践, 2001, 21(2):73-79.

[11] Chen Muchen, Huang S H. Credit scoring and rejected instances reassigning through evolutionary computation techniques[J]. Expert Systems with Applications, 2003, 24(4):433-441.

[12] Marqués A I, García V, Sánchez J S. On the suitability of resampling techniques for the class imbalance problem in credit scoring[J]. Journal of the Operational Research Society, 2012, 64(7):1060-1070.

[13] Schwenker F, Trentin E. Pattern classification and clustering:A review of partially supervised learning approaches[J]. Pattern Recognition Letters, 2014, 37(1):4-14.

[14] Sugiyama M, Idé T, Nakajima S, et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction[J]. Machine Learning, 2010, 78(1-2):35-61.

[15] Zhu Xiaojin. Semi-supervised learning literature survey[J]. Technical Report 1530, University of Wisconsin at Madison, 2006.

[16] Zhang Yihao, Wen Junhao, Wang Xibin, et al. Semi-supervised learning combining co-training with active learning[J]. Expert Systems with Applications, 2014, 41(5):2372-2378.

[17] Yang Tao, Fu Dongmei. Semi-supervised classification with Laplacian multiple kernel learning[J]. Neurocomputing, 2014, 140(9):19-26.

[18] Xiao Jin, He Changzheng, Jiang Xiaoyi, et al. A dynamic classifier ensemble selection approach for noise data[J]. Information Sciences, 2010, 180(18):3402-3421.

[19] Hansen L K, Salamon P. Neural network ensembles[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10):993-1001.

[20] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the Eleventh Annual Conference on Computational Learning Theory, ACM, New York, 1998.

[21] Zhou Zhihua, Li Ming. Tri-training:Exploiting unlabeled data using three classifiers[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11):1529-1541.

[22] 王娇, 罗四维, 曾宪华. 基于随机子空间的半监督协同训练算法[J]. 电子学报, 2008, 36(12):60-65.

[23] 苏艳, 居胜峰, 王中卿, 等. 基于随机特征子空间的半监督情感分类方法研究[J]. 中文信息学报, 2012, 26(4):85-90.

[24] Ho T K. The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8):832-844.

[25] Paleologo G, Elisseeff A, Antonini G. Subagging for credit scoring models[J]. European Journal of Operational Research, 2010, 201(2):490-499.

[26] Merz C J, Murphy P. UCI repository of machine learning 820 databases[EB/OL]. 1995, http://www.ics.uci.edu/~mlearn/MLRepository.html.

[27] Thomas L C, Edelman D B, Crook J N. Credit scoring and its applications[M].US:Siam, 2002.

[28] Chen Feilong, Li Fengchia. Combination of feature selection approaches with SVM in credit scoring[J]. Expert Systems with Applications, 2010, 37(7):4902-4909.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献