主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2022, Vol. 30 ›› Issue (12): 63-76.doi: 10.16381/j.cnki.issn1003-207x.2021.2635

• 论文 • 上一篇    

基于“拆分-填充-分解-集成”的我国线上零售额预测研究

曾能民1, 2, 张明1, 余乐安1, 2, 3   

  1. 1. 哈尔滨工程大学经济管理学院,黑龙江 哈尔滨150001;2.大数据与商务智能技术工业和信息化部重点实验室哈尔滨工程大学,黑龙江 哈尔滨150001;3. 四川大学商学院,四川 成都610065
  • 收稿日期:2021-07-27 修回日期:2022-06-24 发布日期:2023-01-10
  • 通讯作者: 余乐安(1976-),男(汉族),湖南常德人,四川大学商学院,教授,研究方向:大数据挖掘、商务智能、经济预测与金融管理,Email:yulean@amss.ac.cn. E-mail:yulean@amss.ac.cn
  • 基金资助:
    国家自然科学基金资助项目(72001057,72072046);中国博士后科学基金资助项目(2021M7000989);黑龙江省自然科学基金资助项目(LH2020G003);黑龙江省哲学社会科学研究规划项目(20GLC205);黑龙江省博士后科学基金资助项目(LBH-Z21011);中央高校基本科研业务费资助项目(3072022CF0902,3072022WK0914)

Forecasting Online Retail Sales of China Based on Splitting-filling-decomposition-ensemble Model

ZENG Neng-min1, 2, ZHANG Ming1, YU Le-an1, 2, 3   

  1. 1. School of Economics and Management, Harbin Engineering University, Harbin 150001, China;2. Key Laboratory of Big Data and Business Intelligence Technology Harbin Engineering University, Ministry of Industry and Information Technology, Harbin 150001, China;3. Business School, Sichuan University, Chengdu 610065, China
  • Received:2021-07-27 Revised:2022-06-24 Published:2023-01-10
  • Contact: 余乐安 E-mail:yulean@amss.ac.cn

摘要: 线上零售额的准确预测是政府制定零售政策和发展规划的依据,也是电商和物流企业确定发展战略的基础。由于我国线上零售额数据具有样本量小、波动性大、受节日影响大、存在缺失值等特征,准确预测变得十分困难。为解决这个问题,本文提出了一种“拆分-填充-分解-集成”的预测框架。具体而言,首先将数据集拆分为实物零售数据与非实物零售数据两部分。其次,分别根据实物零售与非实物零售数据不同的缺失特征对样条插值法做了改进,提出了基于“样条插值-二分调整”的分解填充法以及基于“分段线性函数拟合-样条插值”的分解填充法,对两组数据进行缺失值填充。继而基于两组数据的不同特征,分别提出“乘法分解-ARIMA-移动平均”以及“STL分解-BP神经网络-灰色波形”的预测方法对两组数据进行预测。最后将两组预测结果集成,得到我国线上零售额的预测值。实证结果表明,本文提出的预测框架能较好地捕捉我国线上零售额数据的特征,具有很高的预测精度,且较传统的缺失值填充和预测方法在性能上表现更好。本文提出的“拆分-填充-分解-集成”预测框架,丰富了现有的缺失值填充与预测方法,并为预测实践提供了解决方案。

关键词: 线上零售;预测;缺失值填充;分解集成

Abstract: In recent years, China’s online retail industry has developed rapidly. Accurate prediction for online retail sales is the basis for government to formulate retail policies, as well as the foundation for ecommerce and logistics companies to determine operation strategies. However, no existing research has focused on macro online retail sales prediction driven by data characteristics. Forecasting the monthly online retail sales of China is a great challenge because the dataset has the characteristics of small sample size, high volatility, large holiday influence. In addition, the China’s online retail sales data has a unique data missing phenomenon: the total number of January and February is known but the monthly value is missing, which is caused by the relevant regulations of the National Bureau of Statistics. Motivated by these, a splittingfillingdecompositionintegration (SFDE) prediction framework is proposed. Specifically, firstly, the data set of online total retail sales of China is split into two parts, i.e., physical retail sales data and nonphysical retail sales data. Secondly, in the light of the incompleteness of online physical retail sales data, a revised spline interpolation approach (i.e., the hybrid approach of spline interpolation and dichotomous adjustment) is proposed to fill the missing value of the data. Meanwhile, considering that nonphysical retail data has different trends at different stages and increasing fluctuations, another revised spline interpolation approach (i.e., the hybrid approach of piecewise linear function fitting and spline interpolation) is proposed to fill the missing value of the data. Thirdly, based on the different characteristics between the physical retail data and nonphysical retail data, two hybrid ensemble forecasting approaches are proposed to predict the above two series, where the first one integrates multiplication decomposition, ARIMA and moving average, and the second one integrates STL decomposition, BP neural network and gray waveform forecasting. Finally, the prediction results of the above two series are integrated to get the predicted value of online total retail sales of China. In our experiments, the monthly data of China’s online retail sales from 2015 to 2019 are selected to verify the model performance. The results obtained in this study show that the revised spline interpolation approaches based on data characteristics are able to solve the problem of mentioned missingdata filling effectively. In addition, the combination of the revised spline interpolation approaches and the hybrid ensemble forecasting approaches achieved significant performance improvements over single model. Furthermore, the SFDE framework of combining the strengths of the conventional and deep learning methods provides a robust modelling framework capable of capturing the nonlinear nature of the complex online retail sales series and thus producing more accurate forecasts. On the whole, the proposed hybrid framework enriches the research of missing value filling methods, and have tremendous scope for application in a wide range of areas for achieving increased accuracies in complex time series forecasting.

Key words: online retail; forecasting; missingdata filling; decomposition ensemble

中图分类号: