主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2018, Vol. 26 ›› Issue (1): 170-178.doi: 10.16381/j.cnki.issn1003-207x.2018.01.017

• 论文 • 上一篇    下一篇

基于文本挖掘和自动分类的法院裁判决策支持系统设计

朱青1,2,4, 卫柯臻1,2,4, 丁兰琳5, 黎建强1,2,3   

  1. 1. 陕西师范大学国际商学院, 陕西 西安 710119;
    2. 陕西师范大学交叉过程感知与控制实验室, 陕西 西安 710119;
    3. 香港城市大学管理科学系, 中国 香港;
    4. 西安交通大学管理学院, 陕西 西安 710049;
    5. 西安交通大学经济金融学院, 陕西 西安 710049
  • 收稿日期:2016-07-18 修回日期:2017-01-02 发布日期:2018-03-19
  • 通讯作者: 朱青(1983-),男(汉族),陕西人,陕西师范大学副教授,硕导,西安交通大学博士后,研究方向:管理科学与工程,E-mail:zhuqing@snnu.edu.cn E-mail:zhuqing@snnu.edu.cn
  • 基金资助:

    科技部国家软科学研究计划项(2012GXS2D027)

Count Judgment Decision Support System Based on Text-mining and Machine Learning

ZHU Qing1,2,4, WEI Ke-zhen1,2,4, DING Lan-lin5, LI Jian-qiang1,2,3   

  1. 1. International Business School of Shaanxi Normal University, Xi'an 710119, China;
    2. Institute of Cross-Process Perception and Control of Shaanxi Normal University, Xi'an 710119, China;
    3. Department of Management Sciences, City University of Hong Kong, Hong kong, China;
    4. Management School of xi'an Jiaotong University, Xi'an 710049, China;
    5. School of Economics and Finance of Xi'an Jiaotong University, Xi'an 710049, China
  • Received:2016-07-18 Revised:2017-01-02 Published:2018-03-19

摘要: 在许多大陆法系国家,不断产生的新型法律关系使得成文法无法及时制定和修改的缺陷逐渐显现。与此同时,世界各国纠纷诉讼的数目也在急剧增长,所以,很多国家面临如何在保证审判质量的前提下提高司法系统审判效率的问题。因此,在进行制度改革的同时,建立决策支持系统将会有效地辅助司法判决。本文以中国的医疗损害诉讼文本为例,使用文本挖掘和自动分类技术提出了一个法院裁判决策支持系统(CJ-DSS),该系统可以依据以往判例预测新诉讼文本的判决结果:驳回与非驳回。结合案例,本文研究发现,组合特征提取法确实能够改进和提高分类器的分类性能,而且针对支持向量机(SVM)、人工神经网络(ANN)、K最近邻(KNN)三种不同的分类器,文档词频-卡方(DF-CHI)组合特征提取法对性能的改进程度有所差异,其中ANN的性能改进最高。除此之外,集成学习后该系统的分类性能更加稳定,显著优于单一分类器,F1值达到93.3%。

关键词: 文本挖掘, 自动分类, 决策支持系统, CJ-DSS

Abstract: In many other countries with the continental legal system, the constant generation of new legal relationships makes, the defect of statute law which is unable to be timely formulate and modify gradually become obvious. As the number of dispute lawsuit rapidly grows, many countries in the world face the problem how to improve the efficiency of the judicial system under the premise of guaranteeing the quality of the trial. Therefore, in addition to reforming the system, the decision support system will effectively improve judicial decisions.
In this paper, medical damage judgment documents in China are taken as example, and a court judgment decision support system (CJ-DSS) is proposed based on text mining and the automatic classification technology. The system can predict the trail results of the new lawsuit texts according to the previous cases verdict:rejected and no rejected. By combining different feature extraction methods (DF, Chi-square and DF-CHI feature combination extraction method) and classifiers (SVM, ANN and KNN), multiple combinations that meet the expected performance as the base learning machines are selected. Based on the theory of Delphi Method, integrated learning is used to predict new cases. Integrated learning refers to constructing a new model and using the prediction result of base learning machines that have met expectations as input after proper training, and finally outputting a prediction result with maximum probability through linear or non-linear calculations.
At the same time, by combining with real cases, it is found that the combination feature extraction method can indeed improve the classifier's performance, especially for SVM, ANN and KNN classifiers. In addition, the system classification performance became more consistent after integrated learning. The best performance reached 93.3%, which significantly increased system accuracy.
This paper's data source is the "BeiDaFaBao" legal database. "Medical malpractice" is used as the keyword and more than 300 court verdict and mediation documents from 2013 are retrieved. Due to the short format of mediation documents and its brief case explanations, they are eliminated from the study. The rest of the documents are trained and tested after preprocessing.
In previous studies, the accuracy of text classification system has been greatly influenced by the training set size:the larger the training set data, the better the performance. This paper has a reference value for constructing structured high-performance system based on a small sample training set in the future. Meanwhile, since the process of labelling documents is costly, therefore, the study and model construction for unlabeled text should be the focus of future research for data scientists.

Key words: text-mining, automatic text classification, decision support system, CJ-DSS

中图分类号: