主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2022, Vol. 30 ›› Issue (12): 131-140.doi: 10.16381/j.cnki.issn1003-207x.2021.2694

• 论文 • 上一篇    

基于多目标多元学习细菌觅食优化算法的混合数据聚类

牛奔1, 2, 郭晨3, 唐恒3   

  1. 1.深圳大学管理学院,广东 深圳518060;2.深圳大学大数据智能管理与决策研究所,广东 深圳518060;3.澳门大学工商管理学院,澳门特别行政区999078
  • 收稿日期:2021-08-10 修回日期:2022-02-17 发布日期:2023-01-10
  • 通讯作者: 郭晨(1991-),女(汉族),湖北孝感人,澳门大学工商管理学院,博士研究生,研究方向:群体智能、聚类分析,Email: chen.guo@connect.um.edu.mo. E-mail:chen.guo@connect.um.edu.mo
  • 基金资助:
    国家自然科学基金资助项目(71971143);国家自然科学基金资助重大研究计划(91846301);国家自然科学基金资助重大项目(71790615);澳门大学(MYRG2018-00051-FBA);广东省自然科学基金资助项目(2020A1515010749);广东省教育局高等教育重点研究基金资助项目(2019KZDXM030)

Multi-objective Multi-learning Bacterial Foraging Optimization Algorithm for Mixed Data Clustering

NIU Ben1, 2, GUO Chen3, TANG Heng3   

  1. 1. College of Management, Shenzhen University, Shenzhen 518060, China;2. Institute of Big Data Intelligent Management and Decision, Shenzhen University, Shenzhen 518060, China;3. Faculty of Business Administration, University of Macau, Macau 999078, China
  • Received:2021-08-10 Revised:2022-02-17 Published:2023-01-10
  • Contact: 郭晨 E-mail:chen.guo@connect.um.edu.mo

摘要: 针对混合属性数据聚类问题,本文提出一种基于多目标多元学习细菌觅食优化算法。首先,基于改进的细菌觅食优化算法,提出多目标优化算法框架。然后,提出多元学习策略来提高算法性能。具体地,对于细菌个体,细菌之间采用环形拓扑学习策略,每个细菌只能向其邻域最优个体学习;细菌个体还可以向外部档案非支配个体学习。通过该学习策略,不仅可以保持种群的多样性,也可以加快算法收敛速度。对于外部档案非支配个体,记录其变化趋势,当非支配个体的变化处于停滞状态时,采用精英学习策略对非支配个体进行微扰动,提高非支配解的多样性。最后,为解决混合属性数据聚类问题,设计了一种具有属性权重的混合属性转换策略。为了验证所提算法的性能,将该算法与两个多目标进化算法和三个经典聚类算法在六个标准数据集上进行对比实验。实验结果表明,所提算法在解决数值、分类和混合属性数据聚类问题上具有显著优势。同时,以金融领域信用卡申请客户数据为例,进一步证实了所提算法的可行性,也表明了所提算法在涉及混合属性数据集的医疗、管理、工程等领域有一定的应用前景。

关键词: 混合属性数据聚类;细菌觅食优化算法;多目标优化;多元学习策略

Abstract: With the easy generation and acquisition of data in medical, management, financial, and other fields, a large amount of data with mixed attributes is generated. How to mine valuable information from these kinds of data has attracted the attention of researchers. Clustering is one of the famous data mining methods, which can be employed to find information from the mixed attribute data sets. Various mixed-type data clustering methods have been designed, which can be divided into general clustering algorithms and evolutionary computation-based clustering algorithms. Among them, the evolutionary computation-based clustering algorithms mainly include single-objective or multi-objective optimization algorithms. These proposed algorithms show good performance under the specific context. However, when facing automatic clustering, high dimensional clustering, and multi-objective clustering problems, the algorithms in the first category cannot get satisfying clustering results; on the contrary, the algorithms in the second category show great potential. Therefore, the researchers have conducted in-depth research on the algorithms in the second category. When using the evolutionary computation-based clustering algorithms, two issues need to be taken into consideration further.On the one hand, these algorithms are proposed based on the K-prototype. It is well recognized that K-prototype employs the Hamming distance to compute the similarity of categorical attributes so that it cannot show the true relations between data samples. On the other hand, these algorithms mainly focus on the genetic algorithm, other evolutionary computation-based algorithms, such as bacterial foraging optimization algorithm, are worth studying in solving mixed-type data clustering problems.

Key words: mixed attribute data clustering; bacterial foraging optimization algorithm; multi-objective optimization; multi-learning strategy

中图分类号: