主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

Chinese Journal of Management Science ›› 2016, Vol. 24 ›› Issue (6): 124-131.doi: 10.16381/j.cnki.issn1003-207x.2016.06.015

• Articles • Previous Articles     Next Articles

A Semi-Supervised Co-Training Model for Customer Credit Scoring

XIAO Jin1, XUE Shu-tian1, HUANG Jiing2, XIE Ling1, GU Xin1,3   

  1. 1. Business School, Sichuan University, Chengdu 610064, China;
    2. School of Public Administration of Sichuan University, Chengdu 610064, China;
    3. Soft Science Institute of Sichuan University, Chengdu, 610064, China
  • Received:2015-02-13 Revised:2015-05-12 Online:2016-06-20 Published:2016-07-05

Abstract: Customer credit scoring is one of the most important issues in customer relationship management (CRM). In some real credit scoring issues, many customer samples without class labels are abandoned and just only a few samples with class labels can be used to train the classification models, because it costs a lot of manpower, financial and material resources for labeling the samples. Furthermore, single classification model is difficult to achieve the accurate classification of the whole sample space as the current customer credit scoring problem with class imbalance characteristic. To solve the two problems, semi-supervised learning is introduced and combined with random subspace (RSS) in multiple classifiers ensemble, and then RSS is proposed based semi-supervised co-training model for class imbalance, RSSCI. This model includes the following three phases: 1) Obtains many base classifiers by RSS; 2) Labels some most appropriate samples in U which obtains lots of samples without class labels. Firstly, 3 base classifiers with the best performance are selected to classify the samples in U, the samples with the same forecasted class are put into the candidate set, and then the label confidence of each sample is calculated. Considering the class imbalance of the training data, the candidate are divided set into the positive and negative subsets, and the samples with higher confidence are selected from the two subsets according to the ratio of two classes in the original training set and added the original training set; 3) Trains the classification model in the final training set, and classifies the test set. Empirical analysis is conducted in three credit scoring datasets (German, Australia, UK-thomas, all of them are imbalanced data sets of a type distribution ; moreover, German and Australia are from the UCI international public database) , and the results show that the performance of RSSCI model is superior to the common used supervised ensemble credit scoring models and some existing semi-supervised CO-training credit scoring models, demonstrating the superiority of the RSSCI model of selective mechanism of labeling samples. In CRM, there are a lot of customer classification problems, such as customer churn prediction, customer targeting, which are similar to customer credit scoring. Thus, the model proposed in this study can also be used to solve the above problems, and thus is expected to achieve satisfaction classification performance.

Key words: credit scoring, class imbalance, semi-supervised, co-training, RSS

CLC Number: