主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院
论文

基于改进GMDH的目标客户选择模型研究

展开
  • 1. 四川大学商学院, 四川 成都 610064;
    2. 中国科学院大学经济与管理学院, 北京 100190;
    3. 中国科学院虚拟经济与数据科学研究中心, 北京 100190;
    4. 成都信息工程大学管理学院, 四川 成都 610225;
    5. 中国科学院数学与系统科学研究院, 北京 100190
肖进(1983-),男(汉族),四川广安人,四川大学商学院副教授,管理学博士,中国科学院数学与系统科学研究院博士后,研究方向:大数据分析、商务智能、客户关系管理.

收稿日期: 2014-10-30

  修回日期: 2015-01-08

  网络出版日期: 2015-10-24

基金资助

国家自然科学基金资助项目(71471124, 71101100, 71273036);四川省社科规划项目(SC14C019);四川大学优秀青年基金(2013SCU04A08);四川省青年基金项目(2015RZ0056);四川省科技厅基础研究项目(2015JY0022)

Customer Targeting Model Based on Improved GMDH

Expand
  • 1. Business School, Sichuan University, Chengdu 610064, China;
    2. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China;
    3. Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China;
    4. Management Faculty, Chengdu University of Information Technology, Chengdu 610225, China;
    5. Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100190 China

Received date: 2014-10-30

  Revised date: 2015-01-08

  Online published: 2015-10-24

摘要

近年来,目标客户选择建模成为客户关系管理领域的研究热点。为了解决用于目标客户选择建模的训练样本类别分布高度不平衡的问题,本文首先提出了混合抽样方法。进一步地,将数据分组处理(GMDH)神经元网络引入到客户特征选择中,提出新的特征选择算法Log-GMDH。该算法分别从传递函数的选择和新的外准则的构建两个方面对传统GMDH网络模型进行了改进。最后,将提出的混合抽样、Log-GMDH和Logistic回归分类算法相结合,构建目标客户选择模型LogGMDH-Logistic。在CoIL2000预测竞赛中某汽车保险公司的目标客户选择数据集上进行实证分析,结果表明,LogGMDH-Logistic模型不仅在性能上优于已有的一些目标客户选择模型,而且具有很好的可解释性。

本文引用格式

肖进, 唐静, 刘敦虎, 谢玲, 汪寿阳 . 基于改进GMDH的目标客户选择模型研究[J]. 中国管理科学, 2015 , 23(10) : 162 -169 . DOI: 10.16381/j.cnki.issn1003-207x.2015.10.019

Abstract

In recent years, database marketing has become a hot topic in customer relationship management (CRM), and customer targeting modeling is one of the most important issues in database marketing. Essentially, customer targeting modeling is a binary classification problem, that is, all customers are divided into two categories: the customers responding to the corporate marketing activities and the ones responding to no activities. This study combines group method of data handling (GMDH) neural networks, re-sampling technique, as well as Logistic regression classification algorithm to construct customer targeting model LogGMDH-Logistic. This model consists of three phases: (1) In order to solve the highly imbalanced class distribution of training set for customer targeting modeling, a new resampling method (hybrid sampling) is proposed to balance the class distribution of training set; (2) To select some key features from a large number of characteristics describing the customers, the GMDH neural network is introduced and a new feature selection algorithm Log-GMDH is presented, which improves the traditional GMDH neural network model in both the selection of transfer function and the construction of new external criterion. In terms of the selection of transfer function, it uses the non-linear Logistic regression function to replace the linear transfer function of the traditional GMDH neural network; and in the construction of external criterion, it selects the hit rate suitable for the customer targeting modeling to replace the regularization criterion of the traditional GMDH neural network; (3) It obtains the training set by mapping according to the selected feature subset, trains the Logistic regression classification algorithm and predicts the response probability of potential customers. The experiment is carried out in a customer targeting dataset of a car insurance company from CoIL2000 prediction competition, and the results show that LogGMDH-Logistic model is superior to some existing customer targeting models both in performance and interpretability. In CRM, there are a lot of customer classification problems, such as customer churn prediction, customer credit scoring, which are similar to customer targeting modeling. Thus, the model proposed in this study can also be used to solve the above problems, and is expected to achieve satisfaction classification performance.

参考文献

[1] Yao Zhiyuan, Sarlin P, Eklund T, et al. Combining visual customer segmentation and response modeling[J]. Neural Computing and Applications, 2014,25(1):123-134.

[2] Ou Chuanxin, Liu Chunnian, Huang Jiajing, et al. On data mining for direct marketing[M]//Slezak D, Wang Guoyin, Liu Qing, et al. Rough sets, Fuzzy sets, data mining, and granular computing. Berlin, Heidelberg:Springer, 2003,491-498.

[3] Blattberg R C, Kim B D, Neslin S A. Database marketing:Analyzing and managing customers[M]. Berlin:Springer, 2008.

[4] Maalouf M, Siddiqi M. Weighted logistic regression for large-scale imbalanced and rare events data[J]. Knowledge-Based Systems, 2014, 59:142-148.

[5] 贺昌政. 自组织数据挖掘与经济预测[M]. 北京:科学出版社, 2005.

[6] 司昕. 预测方法中的神经网络模型[J]. 预测, 1998, 17(2):32-35.

[7] Dahl G E, Yu Dong, Deng Li, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. Audio, Speech, and Language Processing, IEEE Transactions on, 2012, 20(1):30-42.

[8] Lawrence S, Giles C L, Tsoi A C. Lessons in neural network training:Overfitting may be harder than expected[C]//Proceedings of the 14th National Con ference on Artificial Intelligence and 9th Innovative Applications of Artifictal Intelligence Conference, Providence, Rhode Island, July 27-31, 1997.

[9] 陈涛, 谢阳群. 文本分类中的特征降维方法综述[J]. 情报学报, 2005, 24(6):690-695.

[10] Malhi A, Gao R X. PCA-based feature selection scheme for machine defect classification[J]. Instrumentation and Measurement, 2004, 53(6):1517-1525.

[11] Kim Y, Street W N, Russell G J, et al. Customer targeting:A neural network approach guided by genetic algorithms[J]. Management Science, 2005, 51(2):264-276.

[12] 周昉, 何洁月. 生物信息学中基因芯片的特征选择技术综述[J]. 计算机科学, 2007, 34(12):143-150.

[13] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2):161-166.

[14] Ivakhnenko A. The group method of data handling in prediction problems[J]. Soviet Automatic Control, 1976, 9(6):21-30.

[15] 何跃, 鲍爱根, 贺昌政, 自组织建模方法和GDP增长模型研究[J]. 中国管理科学, 2004, 12(2):139-142.

[16] 朱帮助, 张秋菊, 邹昊飞, 等. 基于OSA算法和GMDH网络集成的电子商务客户流失预测[J]. 中国管理科学, 2011, 19(5):64-70.

[17] 邹昊飞, 夏国平,杨方廷. 基于自组织算法的改进型GAANN预测模型[J]. 中国管理科学, 2005, 13(6):75-80.

[18] Malhotra R, Chug A. Application of group method of data handling model for software maintainability prediction using object oriented systems[J]. International Journal of System Assurance Engineering and Management, 2014, 5(2):165-173.

[19] Ratrout N T. Short-term traffic flow prediction using group method data handling (GMDH)-based abductive networks[J]. Arabian Journal for Science and Engineering, 2014, 39(2):631-646.

[20] Berry M J, Linoff G S. Data mining techniques:For marketing, sales, and customer relationship management[M]. New Jersey:John Wiley & Sons, 2004.

[21] Mueller J A, Lemke F. Self-organising data mining:An intelligent approach to extract knowledge from data[M]. Hamburg:Libri, 2000.

[22] Li Hu, Zou Peng, Wang Xiang, et al. A new combination sampling method for imbalanced data[C]//Sun Zengqi, Deng Zhidong. Proceedings of 2013 Chinese Intelligent Automation Conference. Beijing:Springer Berlin Heidelberg, 2013:543-554.

[23] Sarychev A. An averaged regularity criterion for the group method of data handling in the problem of searching for the best regression[J]. Soviet Journal of Automation and Information Sciences c/c of Avtomatika, 1990, 23(5):24-29.

[24] Van der Putten P, De Ruiter M, Van Someren M. Coil challenge 2000 tasks and results:Predicting and explaining caravan policy ownership[J]. Working paper, Universiteit van Amsterdam, 2000.

[25] Fawcett T, Flach P A. A response to webb and ting's on the application of ROC analysis to predict classification performance under varying class distributions[J]. Machine Learning, 2005, 58(1):33-38.

[26] Groot L, Zonneveld E. European union budget contributions and expenditures:A Lorenz Curve Approach[J]. Journal of Common Market Studies, 2013, 51(4):649-666.
文章导航

/