主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2004, Vol. ›› Issue (4): 144-148.

• 论文 • 上一篇    

利用上凸函数对决策树算法的改进

高学东, 尹阿东, 张健, 宫雨, 武森   

  1. 北京科技大学管理学院, 北京, 100083
  • 收稿日期:2004-03-24 出版日期:2004-08-28 发布日期:2012-03-07

An Improved Algorithm of Decision Trees by Using the Convex Function

GAO Xue-dong, YIN A-dong, ZHANG Jian, GONG Yu, WU Sen   

  1. School of Management, Beijing University of Science and Technology, Beijing 100083, China
  • Received:2004-03-24 Online:2004-08-28 Published:2012-03-07

摘要: 针对决策树分类方法的计算效率进行深入研究,根据信息增益计算的特点,引入了上凸函数的概念,用于提高决策树分类过程中信息增益的计算效率。利用我们所提出的“一致性定理”和“特殊一致性定理”,从理论上证明了利用上凸函数对信息增益计算进行改进后,构造的决策树与原决策树具有相同的分类准确率。同时我们通过对大数据集的实验,发现在相同规模的数据集下,改进后的决策树算法比原算法有更高的计算效率,并且这种计算效率的提高有随着数据集规模的增加而增加的趋势。

关键词: 决策树, ID3算法, 上凸函数, 信息熵

Abstract: In this paper,we research deeply the theory of decision trees induction.According to the character of expected information and the quality of convex function,we propose a new algorithm to raise the efficiency of calculating expected information in the process of inducing the decision trees.By using the theory of consistency and special consistency,we also prove that the accuracy of decision trees constructed by the improved algorithm is equal to the one of ID3 algorithm.At the same time,through the experiment of testing the large datasets,we find that the new algorithm has higher calculative efficiency than the old one in the same datasets.Moreover with the larger scale of datasets,the calculation of expected information has more rapid efficiency.

Key words: decision tress, ID3 Algorithm, convex function, expected information

中图分类号: