主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2018, Vol. 26 ›› Issue (11): 186-196.doi: 10.16381/j.cnki.issn1003-207x.2018.11.019

• 论文 • 上一篇    

代价敏感的目标客户选择半监督集成模型研究

肖进1, 刘潇潇1, 谢玲1, 刘敦虎2, 黄静3   

  1. 1. 四川大学商学院, 四川 成都 610064;
    2. 成都信息工程学院管理学院, 四川 成都 610225;
    3. 四川大学公共管理学院, 四川 成都 610064
  • 收稿日期:2017-03-04 修回日期:2017-12-20 出版日期:2018-11-20 发布日期:2019-01-23
  • 通讯作者: 黄静(1978-),女(汉族),四川大竹人,四川大学公共管理学院,副教授,博士后,研究方向:公共管理计量研究,E-mail:totojh@scu.edu.cn. E-mail:totojh@scu.edu.cn
  • 基金资助:

    国家社科基金重大专项资助项目(18VZL006);国家自然科学基金资助项目(71471124,71273036);四川大学杰出青年基金资助项目(sksyl201709);四川大学哲学社会科学青年学术人才基金项目(skqx201607)

A Cost-sensitive Semi-supervised Ensemble Model for Customer Targeting

XIAO Jin1, LIU Xiao-xiao1, XIE Ling1, LIU Dun-hu2, HUANG Jing3   

  1. 1. Business School, Sichuan University, Chengdu 610064, China;
    2. Management Faculty, Chengdu University of Information Technology, Chengdu 610225, China;
    3. School of Publish Administration of Sichuan University, Chengdu 610064, China
  • Received:2017-03-04 Revised:2017-12-20 Online:2018-11-20 Published:2019-01-23

摘要: 在现实的目标客户选择建模中,往往只能获取少量有类别标签的样本,而剩下的大量样本都无法获取类别标签。已有研究大都使用监督式建模研究范式,仅在少量有类别标签样本集上建模,很难取得令人满意的效果。为解决这一问题,本文引入半监督学习(semi-supervised learning,SSL)技术,将其与代价敏感学习(cost sensitive learning,CSL)和多分类器集成中的随机子空间(random subspace,RSS)方法相结合,提出了代价敏感的目标客户选择半监督集成模型(cost-sensitive semi-supervised ensemble model,CSSE)。该模型使用代价敏感的支持向量机(SVM)来解决目标客户选择建模中样本数据类别分布不平衡的问题,还能够同时使用有、无类别标签的客户样本来建模。进一步地,该模型利用RSS方法训练一系列基本分类模型,并通过集成得到最终的分类结果。在某保险公司目标客户选择数据集上进行实证分析,结果表明,与两种监督式集成模型、两种单一的半监督模型以及两种半监督集成模型相比,CSSE模型具有更好的目标客户选择性能。

关键词: 目标客户选择, 代价敏感, 半监督学习, RSS集成方法, 半监督集成

Abstract: With the advent of the era of big data, more and more customer data are grasped by the enterprises and their marketing concept has changed from "product-centric" to "customer-centric". The enterprises pay more attention to customer relationship management (CRM) than before. In order to avoid the disadvantage of conventional marketing means, such as low efficiency, high cost and so on, many enterprises have started to use database marketing to improve the effectiveness and pertinence of their marketing activities. As one of the most important issues in database marketing, the customer targeting modeling is used to identify target customers from potential customers who are the most likely respond to the marketing means, thus helping the enterprise work out marketing strategies. It takes advantage of various customer information, including identity information, consumer preference, historical purchase records and so on to build the customer targeting model, and then predicts which customers are more likely to respond to marketing means.Actually, the customer targeting modeling is a classification problem.In a real customer targeting modeling, a small number of labeled samples and a large number of unlabeled ones can always be obtained. Most of the existing studies have used the paradigm of supervised learning, which merely built model with the labeled samples, and it's difficult to achieve satisfactory results. In order to solve this problem, semi-supervised learning (SSL) technology is introdueed, and it is combined with cost sensitive learning (CSL) and random subspace (RSS) which is one of the multiple classifiers ensemble methods, and the cost-sensitive semi-supervised ensemble model (CSSE) is proposed. This model uses the cost-sensitive SVM to handle the imbalanced class distribution in customer targeting modeling. Meanwhile, it can build a model with both labeled and unlabeled samples. Further, RSS is adopted to train a series of base classifiers and the final classification results are obtained by integration. The experiment is carried out in a customer targeting database of a car insurance company from CoIL2000 prediction competition, and the results show that CSSE model has better customer targeting performance compared with two supervised ensemble models, two single semi-supervised models, and two semi-supervised ensemble models.Apart from the AUC value which is frequently used, hit rate, Lorenz curve and lift chart are also used to evaluate the customer targeting performance more intuitively. It provides a good idea to further research, that is, more targeted and more reasonable evaluate indicators shouold be used to improve the practicality of the model in the research field. In CRM, there are many other classification problems that are similar to customer targeting modeling, such as customer churn prediction, customer credit scoring. Therefore, the proposed model can also be applied to these fields, and can achieve satisfactory classification performance.

Key words: customer targeting, cost-sensitive, semi-supervised learning, RSS ensemble method, semi-supervised ensemble

中图分类号: