主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

Chinese Journal of Management Science ›› 2013, Vol. 21 ›› Issue (6): 38-46.

• Articles • Previous Articles     Next Articles

Research on Feature Selection Methods of Data Classification

ZHAO Yu, HUANG Si-ming, CHEN Rui   

  1. Institute of Policy & Management, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2011-03-30 Revised:2013-04-30 Online:2013-12-29 Published:2013-12-23

Abstract: By applying support vector machine(SVM) learner, based on semi-definite programming, a new ensembled feature selection within optimization data classification method is draw out to achieve the global optimal feature selection and data classification simultaneously. Firstly, the features are divided into several groups and each sub-feature space kernel matrix is calculated. Then linear combination of these sub-feature kernel matrix for the semi-definite SVM kernel mapping is constructed, getting all the linear weight coefficients from the make global model solving. The classification rate is dominated by the contribution and support educed by the weighted coefficients which can choose the maximum rate, minimal features or generalization ability. Finally, the classification rate and number of features are counted based on these three objectives. For verification purpose, medical, botanical, text recognition, artificial and credit datasets are used for comparing the advantage among SFS, Relief-F, SBS and the ensembled method. Results indicate that the ensembled method can not only obtain better learning efficient but also reduce the features more sharply than SFS, Relief-F, and SBS in some datasets.

Key words: data mining, feature selection, data classification, kernel matrix, semidefinite programming

CLC Number: