基于领域知识和聚类的关联规则深层知识发现研究

张玲玲, 周全亮, 唐广文, 李兴森, 石勇

doi:10.16381/j.cnki.issn1003-207x.2015.02.019

中国管理科学 >

2015 , Vol. 23 >Issue 2: 154 - 161

DOI: https://doi.org/10.16381/j.cnki.issn1003-207x.2015.02.019

论文

基于领域知识和聚类的关联规则深层知识发现研究

展开

1. 中国科学院大学管理学院, 北京 100190;
2. 英大泰和财产保险股份有限公司, 北京 100005;
3. 浙江大学宁波学院, 浙江宁波 315100;
4. 中国科学院虚拟经济与数据科学研究中心, 北京 100190;
5. 中国科学院大数据挖掘与知识管理重点实验室, 北京 100190

张玲玲(1974-),女(汉族),河南人,中国科学院大学管理学院,副教授,研究方向:管理信息系统、知识管理、项目管理.

收稿日期: 2013-07-27

修回日期: 2014-11-13

网络出版日期: 2015-02-28

基金资助

国家自然科学基金资助项目(71471169,71071151)

收起

Research on Algorithm of Post-processing Association Rules Based on Clustering and Domain Knowledge

Expand

1. School of Management, University of Chinese Academy of Sciences, Beijing 100190, China;
2. Ying Da Tai He Property Insurance Co., LTD., Beijing 100005, China;
3. Ningbo Institute of Technology, Zhejiang University, Mingbo 315100, China;
4. Research Center on Fictitious Economy and Data Science, CAS, Beijing 100190, China;
5. Key Laboratory of Big Data Mining and Knowledge Management, CAS, Beijing 100190, China

Received date: 2013-07-27

Revised date: 2014-11-13

Online published: 2015-02-28

Fold

摘要

本文针对传统关联规则挖掘算法产生大量冗余规则,提出了对关联规则结果进行二次挖掘,并设计了算法对挖掘出的关联规则进行聚类,然后基于已有领域知识对聚类后的关联规则进行新颖度评价,对于新颖度较高价值较大的关联规则可以存储于领域知识库用于决策使用或再次挖掘过程。该算法有效的减少的规则的数量,提高了规则的新颖性和精确度,对商业应用具有很高的价值。文章最后使用UCI开源数据进行了实验分析,并验证了该算法的有效性。

关键词： 关联规则; 聚类; 领域知识; 深层知识发现

本文引用格式

张玲玲, 周全亮, 唐广文, 李兴森, 石勇 . 基于领域知识和聚类的关联规则深层知识发现研究[J]. 中国管理科学, 2015 , 23(2) : 154 -161 . DOI: 10.16381/j.cnki.issn1003-207x.2015.02.019

Abstract

Second mining of the result of association rule mining is proposed in solution of the large numbers of redundant rules in the traditional association rules mining algorithm, and the algorithm for clustering of association rules is designed, then the novelty of the association rules is assessed after clustering based on the existing domain knowledge. It is insited that the association rules with more novelty and higher value can be stored in the domain knowledge base, and can be used for the decision or mining again. The algorithm proposed in this paper is effective to reduce the number of rules and also help to improve the novelty and precision of rule, which has a very high value for business applications. Finally the open source data from UCI is used to carry on the experiment to verify the effectiveness of the algorithm.

Key words： association rule; clustering; domain knowledge; post-processing

参考文献

[1] 马建庆,钟亦平,张世永.基于兴趣度的关联规则挖掘算法[J].计算机工程,2006,32(17),121-122.

[2] 朱恒民,姬小利,王宁生.一种挖掘意外规则的方法[J].南京航空航天大学学报,2005,37(3):381-385.

[3] 韦素云,吉根林,曲维光.关联规则的冗余删除与聚类[J].小型微型计算机系统,2006,27(1):110-113.

[4] 杨立波.基于聚类的关联规则挖掘算法[J].太原大学学报,2011,12(3):113-116.

[5] 朱正祥.领域驱动知识发现方法研究[D].大连:大连理工大学,2010.

[6] 李军.智能知识管理模型与获取算法研究[D].北京:中国科学院研究生院,2011.

[7] 朱靖波,陈文亮。基于领域知识的文本分类[J].东北大学学报(自然科学版),2005,26(8):733-735。

[8] 杨立.基于领域知识的知识发现研究[D].北京:中国科学院软件研究所,2005.

[9] 张文凌.领域知识参与数据挖掘预处理阶段的研究[D].北京:北京工业大学,2006.

[10] 朱恒民.领域知识制导的数据挖掘技术及其在中药提取中的应用[D].南京:南京航空航天大学,2006.

[11] 莫富强.基于领域知识的贝叶斯网络学习研究[D].合肥:合肥工业大学,2008.

[12] Hand D,Mannila H,Smyth P.数据挖掘原理[M].张银奎,廖丽,宋俊,译.北京:机械工业出版社,2003.

[13] Lavrac N, Flach P, Zupan B. Rule evaluation measures: A unifying view[C]. Proceedings of the Ninth International Workshop on Inductive Logic Programming, Bled, Slovenia,June 24-27,1999.

[14] Agrawal R,Imielinski T,Swami A. Mining association rules between sets of items in large databases[C]. Proceedings of the ACM SIGMOD Conference on Management of Data,Washington,DC,May 26-28,1993.

[15] Ludwig J, Livingstone G. What's new using prior models as a measure of novelty in knowledge discovery[C]. Proceedings of the 24th IEEE Conference on Tools with Artificial Intelligence,Athens,November 7-9,2012.

[16] Silberschatz A,Tuzhilin A.What makes patterns interesting in knowledge discovery systems[J]. IEEE Trans Transactions on. Knowledge and Data Engineering, 1996,8(6):970-974.

[17] Freitas A.On rule interestingness measures[J]. Knowledge Based Systems,1999,12(5):309-315.

[18] Geng Liqiang, Hamilton H J. Choosing the right lens: Finding what is interesting in data mining[M]//Guillet F,Hamilton H J. Quality measures in data mining. Berlin Heidelberg: Springer, 2007: 3-24.

[19] Hilderman R J, Hamilton H J. Measuring the interestingness of discovered knowledge: A principled approach[J]. Intelligent Data Analysis, 2003, 7(4): 347-382.

[20] Guillet F, Hamilton H J. Quality measures in data mining[M]. Berlin: Springer, 2007.

[21] Dong Guozhu, Li Jinyan. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness[M]//Wu Xinding,Kotagiri R,Korb K B. Research and development in knowledge discovery and data mining. Berlin Heidelberg: Springer, 1998: 72-86.

[22] Lu Songfeng, Hu Heping, Li Fan. Mining weighted association rules[J]. Intelligent Data Analysis, 2001, 5(3): 211-225.

[23] Shen Yidong, Zhang Zhong, Yang Qiang. Objective-oriented utility-based association mining[C]. Proceedings of the IEEE International Conference on Data Mining,Maebashi City,Japan,December 9-12,2002.

[24] Yao Hong, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases[C]. Proceedings of the 2004 SIAM International Conference on Data Mining,Florida,April 22-24,2004.

[25] Ling C X, Chen Tielin, Yang Qiang, et al. Mining optimal actions for profitable CRM[C]. Proceedings of the IEEE International Conference on Data Mining,Maebashi City,Japan,December 9-12.2002.

[26] Wang Ke, Zhou Sengjang, Han Jianwei. Profit mining: From patterns to actions[M]//Bertion E,christodoulakiss,Plexousakis D. Advances in Database Technology. Berlin Heidelberg: Springer, 2002: 70-87.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献