主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院
论文

代价敏感的目标客户选择半监督集成模型研究

展开
  • 1. 四川大学商学院, 四川 成都 610064;
    2. 成都信息工程学院管理学院, 四川 成都 610225;
    3. 四川大学公共管理学院, 四川 成都 610064

收稿日期: 2017-03-04

  修回日期: 2017-12-20

  网络出版日期: 2019-01-23

基金资助

国家社科基金重大专项资助项目(18VZL006);国家自然科学基金资助项目(71471124,71273036);四川大学杰出青年基金资助项目(sksyl201709);四川大学哲学社会科学青年学术人才基金项目(skqx201607)

A Cost-sensitive Semi-supervised Ensemble Model for Customer Targeting

Expand
  • 1. Business School, Sichuan University, Chengdu 610064, China;
    2. Management Faculty, Chengdu University of Information Technology, Chengdu 610225, China;
    3. School of Publish Administration of Sichuan University, Chengdu 610064, China

Received date: 2017-03-04

  Revised date: 2017-12-20

  Online published: 2019-01-23

摘要

在现实的目标客户选择建模中,往往只能获取少量有类别标签的样本,而剩下的大量样本都无法获取类别标签。已有研究大都使用监督式建模研究范式,仅在少量有类别标签样本集上建模,很难取得令人满意的效果。为解决这一问题,本文引入半监督学习(semi-supervised learning,SSL)技术,将其与代价敏感学习(cost sensitive learning,CSL)和多分类器集成中的随机子空间(random subspace,RSS)方法相结合,提出了代价敏感的目标客户选择半监督集成模型(cost-sensitive semi-supervised ensemble model,CSSE)。该模型使用代价敏感的支持向量机(SVM)来解决目标客户选择建模中样本数据类别分布不平衡的问题,还能够同时使用有、无类别标签的客户样本来建模。进一步地,该模型利用RSS方法训练一系列基本分类模型,并通过集成得到最终的分类结果。在某保险公司目标客户选择数据集上进行实证分析,结果表明,与两种监督式集成模型、两种单一的半监督模型以及两种半监督集成模型相比,CSSE模型具有更好的目标客户选择性能。

本文引用格式

肖进, 刘潇潇, 谢玲, 刘敦虎, 黄静 . 代价敏感的目标客户选择半监督集成模型研究[J]. 中国管理科学, 2018 , 26(11) : 186 -196 . DOI: 10.16381/j.cnki.issn1003-207x.2018.11.019

Abstract

With the advent of the era of big data, more and more customer data are grasped by the enterprises and their marketing concept has changed from "product-centric" to "customer-centric". The enterprises pay more attention to customer relationship management (CRM) than before. In order to avoid the disadvantage of conventional marketing means, such as low efficiency, high cost and so on, many enterprises have started to use database marketing to improve the effectiveness and pertinence of their marketing activities. As one of the most important issues in database marketing, the customer targeting modeling is used to identify target customers from potential customers who are the most likely respond to the marketing means, thus helping the enterprise work out marketing strategies. It takes advantage of various customer information, including identity information, consumer preference, historical purchase records and so on to build the customer targeting model, and then predicts which customers are more likely to respond to marketing means.Actually, the customer targeting modeling is a classification problem.In a real customer targeting modeling, a small number of labeled samples and a large number of unlabeled ones can always be obtained. Most of the existing studies have used the paradigm of supervised learning, which merely built model with the labeled samples, and it's difficult to achieve satisfactory results. In order to solve this problem, semi-supervised learning (SSL) technology is introdueed, and it is combined with cost sensitive learning (CSL) and random subspace (RSS) which is one of the multiple classifiers ensemble methods, and the cost-sensitive semi-supervised ensemble model (CSSE) is proposed. This model uses the cost-sensitive SVM to handle the imbalanced class distribution in customer targeting modeling. Meanwhile, it can build a model with both labeled and unlabeled samples. Further, RSS is adopted to train a series of base classifiers and the final classification results are obtained by integration. The experiment is carried out in a customer targeting database of a car insurance company from CoIL2000 prediction competition, and the results show that CSSE model has better customer targeting performance compared with two supervised ensemble models, two single semi-supervised models, and two semi-supervised ensemble models.Apart from the AUC value which is frequently used, hit rate, Lorenz curve and lift chart are also used to evaluate the customer targeting performance more intuitively. It provides a good idea to further research, that is, more targeted and more reasonable evaluate indicators shouold be used to improve the practicality of the model in the research field. In CRM, there are many other classification problems that are similar to customer targeting modeling, such as customer churn prediction, customer credit scoring. Therefore, the proposed model can also be applied to these fields, and can achieve satisfactory classification performance.

参考文献

[1] Ngai E W T, Xiu Li, Chau D C K. Application of data mining techniques in customer relationship management:A literature review and classification[J]. Expert Systems with Applications, 2009, 36(2):2592-2602.

[2] Guido G, Prete M I, Miraglia S, et al. Targeting direct marketing campaigns by neural networks[J]. Journal of Marketing Management, 2011, 27(9):992-1006.

[3] Chu C H C. Intelligent value-based customer segmentation method for campaign management:A case study of automobile retailer[J]. Expert Systems with Applications, 2008, 34(4):2754-2762.

[4] 肖进, 唐静, 刘敦虎, 等. 基于改进GMDH的目标客户选择模型研究[J]. 中国管理科学, 2015, 23(10):162-169.

[5] Shin H, Cho S. Response modeling with support vector machines[J]. Expert Systems with Applications, 2006, 30(4):746-760.

[6] 肖进, 刘敦虎, 顾新, 等. 银行客户信用评估动态分类器集成选择模型[J]. 管理科学学报, 2015, 18(3):114-126.

[7] Kim G, Chae B K, Olson D L. A support vector machine (SVM) approach to imbalanced datasets of customer responses:Comparison with other customer response models[J]. Service Business, 2013, 7(1):167-182.

[8] Kotsiantis S B. Supervised machine mearning:A review of classification techniques[J]. Informatica, 2007, 31(3):249-268.

[9] Lee H J, Shin H, Hwang S S, et al. Semi-supervised response modeling[J]. Journal of Interactive Marketing, 2010, 24(1):42-54.

[10] Sugiyama M, Idé T, Nakajima S, et al. Semi-supervised local Fisher discriminant analysis for dimensionality reduction[J]. Machine Learning, 2010, 78(1-2):35-61.

[11] Davenport M A. The 2ν-SVM:A cost-sensitive extension of the νSVM[R]. Technical Report, Rice University, 2005.

[12] Xiao Jin, Xie Ling, He Changzheng, et al. Dynamic classifier ensemble model for customer classification with imbalanced class distribution[J]. Expert Systems with Applications, 2012, 39(3):3668-3675.

[13] Zhu Xiaojin. Semi-supervised learning literature survey[J]. Computer Science, 2008, 37(1):63-77.

[14] 肖进, 薛书田, 黄静, 等. 客户信用评估半监督协同训练模型研究[J]. 中国管理科学, 2016,24(6):124-131.

[15] Dietterich TG. Ensemble methods in machine learning[C]//Proceedings of the 1st International Workshop on Multiple Classifier Systems, Springer-Verlag, London, June 21-23,2000.

[16] Breiman LI, Friedman JH, Olshen RA, et al. Classification and regression trees (CART)[J]. Biometrics, 1984, 40(3):17-23.

[17] Elkan C. The foundations of cost-sensitive learning[C]//Proceedings of the 17th International Joint Conference on Artificial Intelligence, Seattle, USA, August 04-10, 2001.

[18] 衣柏衡, 朱建军, 李杰. 基于改进SMOTE的小额贷款公司客户信用风险非均衡SVM分类[J]. 中国管理科学, 2016, 24(3):24-30.

[19] 张贵生, 张信东. 基于近邻互信息的SVM-GARCH股票价格预测模型研究[J]. 中国管理科学, 2016, 24(9):11-20.

[20] Ho T K.The random subspace method for constructing decision forests[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,1998,20(8):832-844.

[21] 叶云龙, 杨明. 基于随机子空间的多分类器集成[J]. 南京师范大学学报:工程技术版,2008,8(4):87-90.

[22] Shahshahani B M, Landgrebe D A. The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon[J]. IEEE Transactions on Geoscience & Remote Sensing, 1994, 32(5):1087-1095.

[23] 王娇, 罗四维, 曾宪华. 基于随机子空间的半监督协同训练算法[J]. 电子学报,2008,36(S1):60-65.

[24] Hady M, Schwenker F. Co-training by committee:A new semi-supervised learning framework[C]//Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, Washington DC, USA, December 15-19, 2008.

[25] 苏艳, 居胜峰, 王中卿,等. 基于随机特征子空间的半监督情感分类方法研究[J]. 中文信息学报, 2012, 26(4):85-90.

[26] Li Yiyang, Su Lei, Chen Jun, et al. Semi-supervised learning for question classification in CQA[J]. Natural Computing, 2016,16(4):567-577.

[27] Putten PVD, Ruiter M D, Someren M V. Coil challenge 2000 tasks and results:predicting and explaining caravan policy ownership[R]. Working paper, Universiteit van Amsterdam, 2000.

[28] Kim Y S, Street W N, Russell G J, et al. Customer targeting:A neural network approach guided by Genetic Algorithms[J]. Management Science, 2005, 51(2):264-276.

[29] Duda R O, Hart P E, Stork D G. Pattern Classification 2nd edition[M]. New York, USA:Wiley, 2001,

[30] Breiman L.Bagging predictors[J]. Machine Learning,1996,24(2):123-140.
文章导航

/