基于文本挖掘和自动分类的法院裁判决策支持系统设计

doi:10.16381/j.cnki.issn1003-207x.2018.01.017

中国管理科学 ›› 2018, Vol. 26 ›› Issue (1): 170-178.doi: 10.16381/j.cnki.issn1003-207x.2018.01.017

基于文本挖掘和自动分类的法院裁判决策支持系统设计

朱青^1,2,4, 卫柯臻^1,2,4, 丁兰琳⁵, 黎建强^1,2,3

1. 陕西师范大学国际商学院, 陕西西安 710119;
2. 陕西师范大学交叉过程感知与控制实验室, 陕西西安 710119;
3. 香港城市大学管理科学系, 中国香港;
4. 西安交通大学管理学院, 陕西西安 710049;
5. 西安交通大学经济金融学院, 陕西西安 710049

收稿日期:2016-07-18 修回日期:2017-01-02 出版日期:2018-01-20 发布日期:2018-03-19
通讯作者: 朱青(1983-),男(汉族),陕西人,陕西师范大学副教授,硕导,西安交通大学博士后,研究方向:管理科学与工程,E-mail:zhuqing@snnu.edu.cn E-mail:zhuqing@snnu.edu.cn
基金资助:
科技部国家软科学研究计划项（2012GXS2D027）

Count Judgment Decision Support System Based on Text-mining and Machine Learning

ZHU Qing^1,2,4, WEI Ke-zhen^1,2,4, DING Lan-lin⁵, LI Jian-qiang^1,2,3

1. International Business School of Shaanxi Normal University, Xi'an 710119, China;
2. Institute of Cross-Process Perception and Control of Shaanxi Normal University, Xi'an 710119, China;
3. Department of Management Sciences, City University of Hong Kong, Hong kong, China;
4. Management School of xi'an Jiaotong University, Xi'an 710049, China;
5. School of Economics and Finance of Xi'an Jiaotong University, Xi'an 710049, China

Received:2016-07-18 Revised:2017-01-02 Online:2018-01-20 Published:2018-03-19

摘要/Abstract

摘要： 在许多大陆法系国家，不断产生的新型法律关系使得成文法无法及时制定和修改的缺陷逐渐显现。与此同时，世界各国纠纷诉讼的数目也在急剧增长，所以，很多国家面临如何在保证审判质量的前提下提高司法系统审判效率的问题。因此，在进行制度改革的同时，建立决策支持系统将会有效地辅助司法判决。本文以中国的医疗损害诉讼文本为例，使用文本挖掘和自动分类技术提出了一个法院裁判决策支持系统（CJ-DSS），该系统可以依据以往判例预测新诉讼文本的判决结果：驳回与非驳回。结合案例，本文研究发现，组合特征提取法确实能够改进和提高分类器的分类性能，而且针对支持向量机（SVM）、人工神经网络（ANN）、K最近邻（KNN）三种不同的分类器，文档词频-卡方（DF-CHI）组合特征提取法对性能的改进程度有所差异，其中ANN的性能改进最高。除此之外，集成学习后该系统的分类性能更加稳定，显著优于单一分类器，F₁值达到93.3%。

关键词: 文本挖掘, 自动分类, 决策支持系统, CJ-DSS

Abstract: In many other countries with the continental legal system, the constant generation of new legal relationships makes, the defect of statute law which is unable to be timely formulate and modify gradually become obvious. As the number of dispute lawsuit rapidly grows, many countries in the world face the problem how to improve the efficiency of the judicial system under the premise of guaranteeing the quality of the trial. Therefore, in addition to reforming the system, the decision support system will effectively improve judicial decisions.
In this paper, medical damage judgment documents in China are taken as example, and a court judgment decision support system (CJ-DSS) is proposed based on text mining and the automatic classification technology. The system can predict the trail results of the new lawsuit texts according to the previous cases verdict:rejected and no rejected. By combining different feature extraction methods (DF, Chi-square and DF-CHI feature combination extraction method) and classifiers (SVM, ANN and KNN), multiple combinations that meet the expected performance as the base learning machines are selected. Based on the theory of Delphi Method, integrated learning is used to predict new cases. Integrated learning refers to constructing a new model and using the prediction result of base learning machines that have met expectations as input after proper training, and finally outputting a prediction result with maximum probability through linear or non-linear calculations.
At the same time, by combining with real cases, it is found that the combination feature extraction method can indeed improve the classifier's performance, especially for SVM, ANN and KNN classifiers. In addition, the system classification performance became more consistent after integrated learning. The best performance reached 93.3%, which significantly increased system accuracy.
This paper's data source is the "BeiDaFaBao" legal database. "Medical malpractice" is used as the keyword and more than 300 court verdict and mediation documents from 2013 are retrieved. Due to the short format of mediation documents and its brief case explanations, they are eliminated from the study. The rest of the documents are trained and tested after preprocessing.
In previous studies, the accuracy of text classification system has been greatly influenced by the training set size:the larger the training set data, the better the performance. This paper has a reference value for constructing structured high-performance system based on a small sample training set in the future. Meanwhile, since the process of labelling documents is costly, therefore, the study and model construction for unlabeled text should be the focus of future research for data scientists.

Key words: text-mining, automatic text classification, decision support system, CJ-DSS

中图分类号:

TP18

朱青, 卫柯臻, 丁兰琳, 黎建强. 基于文本挖掘和自动分类的法院裁判决策支持系统设计[J]. 中国管理科学, 2018, 26(1): 170-178.

ZHU Qing, WEI Ke-zhen, DING Lan-lin, LI Jian-qiang. Count Judgment Decision Support System Based on Text-mining and Machine Learning[J]. Chinese Journal of Management Science, 2018, 26(1): 170-178.

参考文献

[1] 董茂云. 法典法,判例法与中国的法典化道路[J]. 比较法研究, 1997, 11(4):1-31.

[2] Prevedello L M, Raja A S, Ip I K, et al. Does clinical decision support reduce unwarranted variation in yield of CT pulmonary angiogram?[J]. American Journal of Medicine, 2013, 126(11):975-81.

[3] Park S H, Rha S W, Byun J S, et al. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system.[J]. The Scientific World Journal,2014,2014(7):137896-137896.

[4] O'Sullivan D, Fraccaro P, Carson E, et al. Decision time for clinical decision support systems.[J]. Clinical Medicine, 2014, 14(4):338-41.

[5] 万映红, 李江, 李怀祖. 虚拟合作的伙伴选择智能决策支持系统框架研究[J]. 系统工程理论与实践, 2001, 21(12):60-65.

[6] 方淑芬, 吕文元. 设备维修管理智能决策支持系统的研究[J]. 系统工程理论与实践, 2001, 21(12):53-59.

[7] Tseng Y H, Lin C J, Lin Y I. Text mining techniques for patent analysis[J]. Information Processing & Management, 2007, 43(5):1216-1247.

[8] Kim J H, Choi K S. Patent document categorization based on semantic structural information[J]. Information Processing & Management An International Journal, 2007, 43(5):1200-1215.

[9] Pong Y H, Kwok C W, Lau Y K, et al. A comparative study of two automatic document classification methods in a library setting[J]. Journal of Information Science, 2008, 34(2):213-230.

[10] Fang Ruihua, Schindelman G, Auken K V, et al. Automatic categorization of diverse experimental information in the bioscience literature[J]. Bmc Bioinformatics, 2012, 13:1-12.

[11] 余乐安,汪寿阳. 基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型及其在信用分类中的应用[J]. 系统科学与数学,2009,29(10):1311-1326.

[12] Coussement K, Poel D V D. Improving customer complaint management by automatic email classification using linguistic style features as predictors[J]. Decision Support Systems, 2008, 44(4):870-882.

[13] 梁昕露,李美娟. 电信业投诉分类方法及其应用研究[J]. 中国管理科学,2015,23(S1):188-192.

[14] Al Qady M, Kandil A. Automatic classification of project documents on the basis of text content[J]. American Society of Civil Engineers, 2015,29(3):04014043.

[15] 周茜, 赵明生, 扈旻. 中文文本分类中的特征选择研究[J]. 中文信息学报, 2004, 18(3):17-23.

[16] Salton G, Yang C, S A Wang A. A vector space model for automatic indexing. Communications of the ACM, 1975,18(11):613-620.

[17] Rocchio J J. Relevance feedback in information retrieval[M]//Salton G, The SMART retrieval system:Experiments in automatic document processing. Englewood cliffs, NJ:Practice-Hall,1971.

[18] Salton G, Buckley C. Term weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988,(5):24,513-523.

[19] 赵燕平,李超.网络安全信息挖掘中的特征选择与专利分析研究[J].中国管理科学,2004, 12(S1):514-518.

[20] Yang Yiming, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning San,Fransisco, July 08-12,1997.

[21] Lee C, Lee G G. Information gain and divergence-based feature selection for machine learning-based text categorization[J].Information Processing and Management,2006,42(1):155-165.

[22] 代六玲,黄河燕,陈肇雄. 中文文本分类中特征抽取方法的比较研究[J]. 中文信息学报, 2004, 18(1):26-32.

[23] Vapnik V. The nature of statistical learning theory[M]. Berlin Springer, 2000.

[24] Burges C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery,1998,2(2):121-167.

[25] 程显毅. 文本挖掘原理[M]. 北京:科学出版社, 2010.

[26] B Lantz. Machine learning with R[M]. Bejjing:China Machine Press,2015.

[27] 刘钢, 胡四泉, 范植华,等. 神经网络在文本分类上的一种应用[J]. 计算机工程与应用, 2003, (36):73-74.

[28] Dasarathy B V. Nearest neighbor (NN) norms:NN pattern classification techniques[M]. Los Alamitos:IEEE Computer Society Press, 1990.

[29] Weiss S M, Indurkhya,Zhang Tong. Fundamentals of predictive text mining[M]. Berlin:Springer,2012.

基于文本挖掘和自动分类的法院裁判决策支持系统设计

Count Judgment Decision Support System Based on Text-mining and Machine Learning

PDF (PC)

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0

[1]	尹海员,寇文娟. 基于朴素贝叶斯法的投资者情绪度量及其对股票特质风险的影响[J]. 中国管理科学, 2024, 32(4): 38-47.
[2]	李成刚, 贾鸿业, 赵光辉, 付红. 基于信息披露文本的上市公司信用风险预警——来自中文年报管理层讨论与分析的经验证据[J]. 中国管理科学, 2023, 31(2): 18-29.
[3]	杜江泽,陈希卓,余乐安. IT背景高管能促进金融科技创新吗？[J]. 中国管理科学, 2023, 31(12): 69-78.
[4]	李明钰,牛东晓,纪正森,施博文,兰心怡,张焕粉. 面向数据空间体系构建的电力制造业多价值链经营风险识别与管控研究[J]. 中国管理科学, 2023, 31(11): 349-360.
[5]	吴洁, 桂亮, 刘鹏, 盛永祥. 多维特征视角下基于图卷积网络的专利技术领域自动识别研究[J]. 中国管理科学, 2022, 30(12): 185-197.
[6]	王余行, 党延忠, 徐照光. 针对论坛数据特点的汽车质量问题挖掘[J]. 中国管理科学, 2021, 29(9): 201-212.
[7]	王伟, 何翎, Kevin Zhu, 孙锐, 王洪伟. 更新信号的阶段性融资效应:基于众筹市场的跨类别实证研究[J]. 中国管理科学, 2020, 28(11): 155-166.
[8]	王富忠, 沈祖志. 物流敏捷调运决策支持系统的研究[J]. 中国管理科学, 2011, 19(1): 84-90.
[9]	王宗军, 崔鑫, 郭忠林, 周庆维. 中国保税区发展水平的集成式智能型综合评价系统研究[J]. 中国管理科学, 2005, (1): 111-116.
[10]	杨善林, 朱卫东, 刘业政. 基于互联网的企业智能决策支持系统研究[J]. 中国管理科学, 2002, (6): 76-80.
[11]	黄诗峰, 李纪人. GIS支持下的防汛指挥决策支持系统的系统分析与设计[J]. 中国管理科学, 2001, (6): 73-80.
[12]	陈晓红, 周艳菊. 基于层次模型法的互联网环境下的群体决策支持系统[J]. 中国管理科学, 2001, (6): 49-57.
[13]	樊群, 达庆利. 基于虚拟数据库的决策支持系统[J]. 中国管理科学, 2001, (3): 62-67.
[14]	李亚静, 何跃, 刘光中. 电视广告决策支持系统[J]. 中国管理科学, 2001, (1): 70-73.
[15]	丛高, 李敏强, 寇纪凇. 企业流程再造的方法研究[J]. 中国管理科学, 1999, (1): 29-35.