主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2015, Vol. 23 ›› Issue (2): 154-161.doi: 10.16381/j.cnki.issn1003-207x.2015.02.019

• 论文 • 上一篇    下一篇

基于领域知识和聚类的关联规则深层知识发现研究

张玲玲1,4,5, 周全亮2, 唐广文2, 李兴森3, 石勇1,4,5   

  1. 1. 中国科学院大学管理学院, 北京 100190;
    2. 英大泰和财产保险股份有限公司, 北京 100005;
    3. 浙江大学宁波学院, 浙江 宁波 315100;
    4. 中国科学院虚拟经济与数据科学研究中心, 北京 100190;
    5. 中国科学院大数据挖掘与知识管理重点实验室, 北京 100190
  • 收稿日期:2013-07-27 修回日期:2014-11-13 出版日期:2015-02-20 发布日期:2015-02-28
  • 作者简介:张玲玲(1974-),女(汉族),河南人,中国科学院大学管理学院,副教授,研究方向:管理信息系统、知识管理、项目管理.
  • 基金资助:

    国家自然科学基金资助项目(71471169,71071151)

Research on Algorithm of Post-processing Association Rules Based on Clustering and Domain Knowledge

ZHANG Ling-ling1,4,5, ZHOU Quan-liang2, TANG Guang-wen2, LI Xing-sen3, SHI Yong1,4,5   

  1. 1. School of Management, University of Chinese Academy of Sciences, Beijing 100190, China;
    2. Ying Da Tai He Property Insurance Co., LTD., Beijing 100005, China;
    3. Ningbo Institute of Technology, Zhejiang University, Mingbo 315100, China;
    4. Research Center on Fictitious Economy and Data Science, CAS, Beijing 100190, China;
    5. Key Laboratory of Big Data Mining and Knowledge Management, CAS, Beijing 100190, China
  • Received:2013-07-27 Revised:2014-11-13 Online:2015-02-20 Published:2015-02-28

摘要: 本文针对传统关联规则挖掘算法产生大量冗余规则,提出了对关联规则结果进行二次挖掘,并设计了算法对挖掘出的关联规则进行聚类,然后基于已有领域知识对聚类后的关联规则进行新颖度评价,对于新颖度较高价值较大的关联规则可以存储于领域知识库用于决策使用或再次挖掘过程。该算法有效的减少的规则的数量,提高了规则的新颖性和精确度,对商业应用具有很高的价值。文章最后使用UCI开源数据进行了实验分析,并验证了该算法的有效性。

关键词: 关联规则, 聚类, 领域知识, 深层知识发现

Abstract: Second mining of the result of association rule mining is proposed in solution of the large numbers of redundant rules in the traditional association rules mining algorithm, and the algorithm for clustering of association rules is designed, then the novelty of the association rules is assessed after clustering based on the existing domain knowledge. It is insited that the association rules with more novelty and higher value can be stored in the domain knowledge base, and can be used for the decision or mining again. The algorithm proposed in this paper is effective to reduce the number of rules and also help to improve the novelty and precision of rule, which has a very high value for business applications. Finally the open source data from UCI is used to carry on the experiment to verify the effectiveness of the algorithm.

Key words: association rule, clustering, domain knowledge, post-processing

中图分类号: