主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2023, Vol. 31 ›› Issue (5): 269-278.doi: 10.16381/j.cnki.issn1003-207x.2020.1780

• 论文 • 上一篇    

基于XGBOOST和ELM的混合空气质量预警系统:以南京为例

高晓辉, 周坤, 李廉水   

  1. 南京信息工程大学管理工程学院,江苏 南京210044
  • 收稿日期:2020-09-14 修回日期:2021-01-07 发布日期:2023-05-23
  • 通讯作者: 周坤(1994-),男(汉族),江苏盐城人,南京信息工程大学管理工程学院,博士研究生,研究方向:应急管理,Email:zk_wjz@163.com. E-mail:zk_wjz@163.com
  • 基金资助:
    国家自然科学基金资助项目(71673145)

Hybrid Air Quality Early Warning System Based on XGBoost and ELM: A Case Study of Nanjing

GAO Xiao-hui, ZHOU Kun, LI Lian-shui   

  1. School of Management Science and Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
  • Received:2020-09-14 Revised:2021-01-07 Published:2023-05-23
  • Contact: 周坤 E-mail:zk_wjz@163.com

摘要: 随着近年来空气污染频繁发生,建立有效的空气质量预警系统已成当务之急。然而,现有研究在设计预警系统时大多忽略了数据预处理和空气质量评价的重要性,从而导致数据信息挖掘不足及其预测结果偏差。本文提出混合空气质量预警系统,由数据预处理、预测和空气质量评价三个模块组成。根据原始数据特征,首先,运用经典的经验模态分解(empirical mode decomposition,EMD)对训练集进行分解。其次,利用Lempel-Ziv复杂度算法将分量识别为高频和低频分量。然后根据平均互信息法(average mutual information,AMI)得到数据输入矩阵,进而对高频分量采用极限梯度提升树算法(eXtreme gradient boosting,XGBoost)并加入多元因素进行预测,而对低频分量采用极限学习机(extreme learning machine,ELM)进行预测,以提高预测模块的预测精度和稳定性。最后,在空气质量评价模块中通过计算找出当天的首要污染物。本文以南京市空气质量为例进行数据收集处理和预测,结果表明,该预测方法较其他参照模型具备更高的精度和更强的稳定性,评价模块也提供了一定的空气质量信息,形成一个完整的预警体系,为决策者治理空气污染提供科学依据。

关键词: 空气质量;EMD;AMI;XGBoost;ELM

Abstract: With the frequent occurrence of air pollution in recent years, it is urgent to establish an effective air quality early warning system. However, most of the existing researches neglect the importance of data preprocessing and air quality evaluation in the design of early warning system, leading the lack of data mining and the deviation of prediction results. A hybrid air quality early warning system is proposed, which consists of three modules namely data preprocessing, prediction and air quality evaluation, respectively. According to the characteristics of the original data, the classical empirical mode decomposition (EMD) is used to decompose the training set. The Lempel Ziv complexity algorithm is applied to identify the sequence after decomposing as high frequency and low frequency components. The data input matrix is obtained according to the average mutual information (AMI). In order to improve the prediction accuracy and stability, the extreme learning machine (ELM) is used to predict the low-frequency sequences. Extreme gradient boosting (XGBoost) algorithm is applied into high-frequency sequences with added multiple factors. Finally, in the air quality assessment module, the primary pollutants of each day is confirmed. In this paper, Nanjing air quality is taken as an example. The results show that the prediction method has higher accuracy and stronger stability than other single models. The evaluation module also provides certain air quality information, forming a complete early warning system and providing scientific basis for decision makers to control air pollution.

Key words: air quality; empirical mode decomposition; average mutual information; xgboost; extreme learning machine

中图分类号: