超大规模线性规划的稀疏存储和预处理中比例行的检测和处理方法

doi:10.16381/j.cnki.issn1003-207x.2017.10.011

中国管理科学 ›› 2017, Vol. 25 ›› Issue (10): 100-108.doi: 10.16381/j.cnki.issn1003-207x.2017.10.011

超大规模线性规划的稀疏存储和预处理中比例行的检测和处理方法

武昱^1,2, 黄思明¹

1. 中国科学院科技战略咨询研究院, 北京 100190;
2. 中国科学院大学, 北京 100049

收稿日期:2016-06-30 修回日期:2017-02-14 出版日期:2017-10-20 发布日期:2017-12-15
通讯作者: 武昱(1986-),男(汉族),甘肃人,中国科学院博士研究生,研究方向:大规模线性规划及在线算法,E-mail:wuyu14@mails.ucas.edu.cn. E-mail:wuyu14@mails.ucas.edu.cn

Sparse Storage for Super-large-scale Linear Programming and Methods for Identifying and Disposing of Duplicate Rows in its Presolving

WU Yu^1,2, HUANG Si-ming¹

1. Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China

Received:2016-06-30 Revised:2017-02-14 Online:2017-10-20 Published:2017-12-15

摘要/Abstract

摘要： 随着大数据时代的到来，线性规划问题的规模越来越大是一种必然。面对超大规模线性规划问题，如何存储数据，使得存储空间节省以避免资源的浪费，并且使得数据的查询、修改和增删方便快捷，是一个急需解决的问题。本文提出了基于十字链表的数据稀疏存储方式。并且，通过对Netlib数据库中的超大规模线性规划问题进行存储分析，对此种存储方式的优越性进行了验证。此外，由于大量冗余数据的存在，在应用算法求解超大规模线性规划问题之前，往往需要进行预处理，而比例行的检测和处理是预处理中必要的关键一步，因此本文提出了比例行的检测和处理方法。首先给出了不同于常理的比例行及其他相关概念的定义；然后结合本文提出的数据存储方式，提出了简单易操作的比例行检测方法；接着总结已有文献得出了比例行消除操作的两个基本原则，并在此基础上通过对比例行所含有的非零元素进行分类，通过理论分析推导出了保证约束矩阵稀疏度不降且单独列增加的比例行处理方法。最后，首先通过一个微型算例对比例行检测和处理的具体过程进行了演示和分析，然后通过Netlib数据库中的6个实际线性规划问题，对比例行检测和处理方法真正作用于超大规模线性规划问题时的效果进行了验证。

关键词: 线性规划, 预处理, 十字链表, 稀疏存储, 比例行

Abstract: With the arrival of the big data era, it is certain and inevitable that the size of linear programming problem is becoming bigger and bigger. In response to super-large-scale linear programming problems, in order to save the storage space,avoid waste of resources, and make the data's inspecting, modifying and striking out more convenient, how to store data is an urgent and important problem. In this paper, a data structure for data's sparse storage is proposed, which is based on improved Orthogonal List. The performance of this method on saving storage space is verified by some super-large-scale linear programming cases from the Netlib database. Furthermore, due to the existing of much redundant data, a presolving process is often required before algorithm is used to solve the linear programming problem. Identifying and disposing of duplicate rows is one of the key steps. In this paper, the methods for identifying and disposing the duplicate rows are proposed. Firstly, the definition of duplicate rows and other related concepts are given. Duplicate rows' definition is different from common sense, in which columns with only one non-zero element have not been take into account. Secondly, combined with the proposed data storage structure, a simple method for identifying duplicate rows is proposed, which is based on classification thought and is very easy to operate.It only needs to inspect one time from the first column to the last column. Thirdly, by summarizing the existing related literature, two basic principles for eliminating redundant rows are obtained. The first step is to increase the number of one-element columns as much as possible, and the second is to reduce the number of the non-zero elements as much as possible. Then, based on these two principles, the nonzero elements of duplicate rows are classified into different sets and further the number of nonzero elements within each set is theoretically analyzed. A method for disposing of duplicate row is obtained, which not only guarantee the data's sparse degree, but also increase the number of one-element column. In the last part, firstly, through applying the proposed methods on a mini linear programming example, the concrete process of Identifying and Disposing of Duplicate Rows is exemplified. Secondly, by applying the proposed methods on some concrete linear programming cases which are selected from the Netlib database, the effectiveness of the methods is verified. From the result, it can be seen that when the proposed data structure and the methods are applied on small-scale linear programming problems or linear programming problem with little duplicate rows, their advantage may be negligible or not obvious. However, when in response to large-scale linear programming problems with dense duplicate rows, the larger the scale or the denser the duplicate rows, the more obvious the effectiveness is.

Key words: linear programming, presolving, orthogonal List, sparse storage, duplicate rows

中图分类号:

O221.1

武昱, 黄思明. 超大规模线性规划的稀疏存储和预处理中比例行的检测和处理方法[J]. 中国管理科学, 2017, 25(10): 100-108.

WU Yu, HUANG Si-ming. Sparse Storage for Super-large-scale Linear Programming and Methods for Identifying and Disposing of Duplicate Rows in its Presolving[J]. Chinese Journal of Management Science, 2017, 25(10): 100-108.

参考文献

[1] 蓝伯雄,王童姝.大规模客运专线网络运营化模型与求解算法[J].中国管理科学,2016,24(6):159-170.

[2] 许启发, 蔡超, 蒋翠侠. 指令不均衡与股票收益关系研究—基于大规模数据分位数回归的实证[J].中国管理科学,2016,24(12):20-29.

[3] 李晖, 黄南京, 叶一军, 基于时空约束的大规模农产品时间柔性生产计划网络优化研究[J].中国管理科学, 2015,23 (4): 157-166.

[4] 阮俊虎, 王旭坪, 杨挺, 大规模灾害中基于聚类的医疗物资联合运送优化[J].中国管理科学 2014,22 (10): 80-89.

[5] 陈骥, 苏为华, 张崇辉.基于属性分布信息的大规模群体评价方法及应用[J].中国管理科学 2013,21 (3): 146-152.

[6] 孙贵吉, 曹晓威.大规模线性优化系统的设计与实现[J].吉林大学学报, 2004,22(3): 258-259.

[7] 成孟金, 赵飞.线性规划模型预处理的研究与实现[J].甘肃科技,2008,24(23):87-89.

[8] Gondzio J, Terlaky T. A computational view of interior-point methods for linear programming[J]. Advanced in Linear and Integer Programming,1995, 36(3): 103-144.

[9] Andersen E D, Andersen K D. Presolving in linear programming [J]. Math Programming, 1995,71(2): 221-245.

[10] Gondzio J.Presolve analysis of linear programs prior to applying an interior point method[R].Technical Report,Department of Management Studies,University of Geneva, Switzerland,1994.

[11] 胡艳杰,黄思明,Adrien N,et al.对偶性在线性规划预处理中的应用分析[J].中国管理科学,2016,24(12):117-126.

[12] 肖宏启.数据结构[M].清华:清华大学出版社,2016.

[13] Adler I, Karmarkar N, Resende M G C, Veiga G. Data structures and programming techniques for the implementation of karmarkar's algorithm[J] ORSA: Journal on Computing, 1989:1(2):84-106.

[14] Tomlin L A, Welch J S. Finding duplicate rows in a linear programming model[J]. Operations Research Letters, 1986,5(1):7-1.

[1]	张莉莉, 杨文文, 罗冠聪. 面向时间优化的“任务-人员”匹配逆最优值方法：以石化设备抢修为例[J]. 中国管理科学, 2023, 31(6): 276-286.
[2]	李德昌, 杨华龙, 宋巍, 郑建风. 考虑船舶速度偏差的集装箱班轮运输货运收益鲁棒优化[J]. 中国管理科学, 2023, 31(4): 151-160.
[3]	戢守峰, 刘红玉, 赵鹏云, 戢婷婷. 基于PI的企业动态库存补货模型与算法[J]. 中国管理科学, 2023, 31(2): 205-214.
[4]	苏志雄,乞建勋,邹鑫,魏汉英,魏亚锋. 一类局域性资源受限项目调度问题的新0-1混合线性优化模型[J]. 中国管理科学, 2023, 31(11): 238-247.
[5]	牛东晓,斯琴卓娅,王董禹,许晓敏,张焕粉. 基于数据挖掘的电力装备企业多价值链协同数据预处理方法研究及应用[J]. 中国管理科学, 2023, 31(11): 321-331.
[6]	董乾东, 李敏. 考虑不同碳排放处理模式的动态供应商选择及采购批量问题研究[J]. 中国管理科学, 2022, 30(8): 106-116.
[7]	沈洁, 杨欣, 吴建军, 孙会君, 高自友. 基于平均最小间隔的城际铁路线路运输能力计算模型[J]. 中国管理科学, 2022, 30(1): 222-229.
[8]	胡鸿韬, 边迎迎, 郭书源, 王帅安, 严伟. 考虑定价和需求关系的供应链网络优化研究[J]. 中国管理科学, 2020, 28(10): 165-171.
[9]	徐林明, 李美娟. 动态综合评价中的数据预处理方法研究[J]. 中国管理科学, 2020, 28(1): 162-169.
[10]	陶杰, 卢超. 整数DEA问题的求解方法与改进[J]. 中国管理科学, 2017, 25(6): 151-160.
[11]	张晨晓, 祝蕊, 刘海月, 张江华. 考虑伤员心理状况的应急医疗救护问题研究[J]. 中国管理科学, 2017, 25(10): 187-196.
[12]	胡艳杰, 黄思明, N. Adrien, 武昱. 对偶性在线性规划预处理中的应用分析[J]. 中国管理科学, 2016, 24(12): 117-126.
[13]	王良, 冯涛. 基于两阶段决策的封闭式基金价格控制问题研究[J]. 中国管理科学, 2014, 22(2): 24-31.
[14]	姜波, 蓝伯雄. 可变维核心矩阵LU分解方法[J]. 中国管理科学, 2008, 20(6): 67-74.
[15]	余福茂, 王富忠, 沈祖志. 一类产品质量成本优化问题的研究[J]. 中国管理科学, 2008, 20(6): 41-45.

超大规模线性规划的稀疏存储和预处理中比例行的检测和处理方法

Sparse Storage for Super-large-scale Linear Programming and Methods for Identifying and Disposing of Duplicate Rows in its Presolving

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0