主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2023, Vol. 31 ›› Issue (10): 234-244.doi: 10.16381/j.cnki.issn1003-207x.2020.1628

• • 上一篇    

基于改进LightGBM集成模型的胃癌存活性预测方法

冯易1,王杜娟1(),胡知能1,崔少泽2   

  1. 1.四川大学商学院,四川 成都 610064
    2.大连理工大学经济管理学院,辽宁 大连 116024
  • 收稿日期:2020-08-23 修回日期:2021-06-03 出版日期:2023-10-15 发布日期:2023-11-03
  • 通讯作者: 王杜娟 E-mail:wangdujuan@dlut.edu.cn
  • 基金资助:
    国家自然科学基金资助重点项目(71533001)

Prediction Method for Gastric Cancer Survivability Based on an Improved LightGBM Ensemble Model

Yi FENG1,Du-juan WANG1(),Zhi-neng HU1,Shao-ze CUI2   

  1. 1.Business School, Sichuan University, Chengdu 610064, China
    2.School of Economics and Management, Dalian University of Technology, Dalian 116024, China
  • Received:2020-08-23 Revised:2021-06-03 Online:2023-10-15 Published:2023-11-03
  • Contact: Du-juan WANG E-mail:wangdujuan@dlut.edu.cn

摘要:

胃癌存活性预测是胃癌预后的一项重要工作,通过挖掘影响胃癌患者存活性的重要特征以及准确地对存活性进行预测,能为医生诊疗提供决策支持。针对目前胃癌存活性预测研究中存在的准确性不足及缺乏模型可解释性等问题,本文提出一种基于改进LightGBM的集成分类方法(SGPL-LightGBM)进行胃癌存活性预测和预测模型事后解释。首先,借助基于稳定性的特征选择(stability feature selection)方法确定了最佳特征子集,在减少计算开销的同时提升了预测准确性。之后,引入智能优化算法(genetic algorithm)对SGPL-LightGBM模型中的重要超参数进行优化,进一步提升胃癌存活性预测模型性能。结合部分依赖图分析方法(partial dependence plot)和LIME(local interpretable model-agnostic explanations)技术,对重要特征给模型预测响应带来的影响进行解释。对在真实胃癌数据集进行了验证实验,结果表明,所提出的SGPL-LightGBM集成分类预测方法在胃癌存活性预测上有着更高的准确性及可解释性,能为医生治疗方案的制订提供有效地决策支持。

关键词: 胃癌存活性预测, 特征选择, 集成学习, 智能优化算法, 可解释性

Abstract:

The prediction of gastric cancer survivability is one of the important works in the gastric cancer prognosis, which can provide decision-making support for the doctor by mining important features affecting the survivability of gastric cancer patients and accurately predicting survivability. So far, the accuracy of survivability prediction of gastric cancer is insufficient, and the interpretability of prediction models is lack. Therefore, a gastric survivability prediction method called SGPL-LightGBM based on improved LightGBM is proposed to predict the survivability of patients with gastric cancer and interpret the prediction model. The stability-based feature selection method is carried out to determine the optimal feature subset, thereby reducing the computational overhead and improving the accuracy of the prediction. After that, the intelligent optimization algorithm (Genetic algorithm) is adopted to optimize the important hyperparameters in LightGBM (Light gradient boosting machine) to further improves the performance of gastric cancer survivability prediction model. The important features of gastric survivability prediction model are analyzed by PDP (Partial dependence plot) and LIME (Local interpretable model-agnostic explanations), which explain the effect of influencing features on the predicted response of SGPL-LightGBM. Finally, the numeral experiments are conducted on real gastric dataset, and the experimental results indicate that the proposed ensemble classification prediction method SGPL-LightGBM has better accuracy and interpretability in gastric cancer survivability prediction, which can provide effective decision support for doctors to develop a treatment plan.

Key words: gastric cancer survival prediction, feature selection, ensemble learning, intelligence algorithm, interpretability

中图分类号: