In many other countries with the continental legal system, the constant generation of new legal relationships makes, the defect of statute law which is unable to be timely formulate and modify gradually become obvious. As the number of dispute lawsuit rapidly grows, many countries in the world face the problem how to improve the efficiency of the judicial system under the premise of guaranteeing the quality of the trial. Therefore, in addition to reforming the system, the decision support system will effectively improve judicial decisions.
In this paper, medical damage judgment documents in China are taken as example, and a court judgment decision support system (CJ-DSS) is proposed based on text mining and the automatic classification technology. The system can predict the trail results of the new lawsuit texts according to the previous cases verdict:rejected and no rejected. By combining different feature extraction methods (DF, Chi-square and DF-CHI feature combination extraction method) and classifiers (SVM, ANN and KNN), multiple combinations that meet the expected performance as the base learning machines are selected. Based on the theory of Delphi Method, integrated learning is used to predict new cases. Integrated learning refers to constructing a new model and using the prediction result of base learning machines that have met expectations as input after proper training, and finally outputting a prediction result with maximum probability through linear or non-linear calculations.
At the same time, by combining with real cases, it is found that the combination feature extraction method can indeed improve the classifier's performance, especially for SVM, ANN and KNN classifiers. In addition, the system classification performance became more consistent after integrated learning. The best performance reached 93.3%, which significantly increased system accuracy.
This paper's data source is the "BeiDaFaBao" legal database. "Medical malpractice" is used as the keyword and more than 300 court verdict and mediation documents from 2013 are retrieved. Due to the short format of mediation documents and its brief case explanations, they are eliminated from the study. The rest of the documents are trained and tested after preprocessing.
In previous studies, the accuracy of text classification system has been greatly influenced by the training set size:the larger the training set data, the better the performance. This paper has a reference value for constructing structured high-performance system based on a small sample training set in the future. Meanwhile, since the process of labelling documents is costly, therefore, the study and model construction for unlabeled text should be the focus of future research for data scientists.
ZHU Qing, WEI Ke-zhen, DING Lan-lin, LI Jian-qiang
. Count Judgment Decision Support System Based on Text-mining and Machine Learning[J]. Chinese Journal of Management Science, 2018
, 26(1)
: 170
-178
.
DOI: 10.16381/j.cnki.issn1003-207x.2018.01.017
[1] 董茂云. 法典法,判例法与中国的法典化道路[J]. 比较法研究, 1997, 11(4):1-31.
[2] Prevedello L M, Raja A S, Ip I K, et al. Does clinical decision support reduce unwarranted variation in yield of CT pulmonary angiogram?[J]. American Journal of Medicine, 2013, 126(11):975-81.
[3] Park S H, Rha S W, Byun J S, et al. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system.[J]. The Scientific World Journal,2014,2014(7):137896-137896.
[4] O'Sullivan D, Fraccaro P, Carson E, et al. Decision time for clinical decision support systems.[J]. Clinical Medicine, 2014, 14(4):338-41.
[5] 万映红, 李江, 李怀祖. 虚拟合作的伙伴选择智能决策支持系统框架研究[J]. 系统工程理论与实践, 2001, 21(12):60-65.
[6] 方淑芬, 吕文元. 设备维修管理智能决策支持系统的研究[J]. 系统工程理论与实践, 2001, 21(12):53-59.
[7] Tseng Y H, Lin C J, Lin Y I. Text mining techniques for patent analysis[J]. Information Processing & Management, 2007, 43(5):1216-1247.
[8] Kim J H, Choi K S. Patent document categorization based on semantic structural information[J]. Information Processing & Management An International Journal, 2007, 43(5):1200-1215.
[9] Pong Y H, Kwok C W, Lau Y K, et al. A comparative study of two automatic document classification methods in a library setting[J]. Journal of Information Science, 2008, 34(2):213-230.
[10] Fang Ruihua, Schindelman G, Auken K V, et al. Automatic categorization of diverse experimental information in the bioscience literature[J]. Bmc Bioinformatics, 2012, 13:1-12.
[11] 余乐安,汪寿阳. 基于核主元分析的带可变惩罚因子最小二乘模糊支持向量机模型及其在信用分类中的应用[J]. 系统科学与数学,2009,29(10):1311-1326.
[12] Coussement K, Poel D V D. Improving customer complaint management by automatic email classification using linguistic style features as predictors[J]. Decision Support Systems, 2008, 44(4):870-882.
[13] 梁昕露,李美娟. 电信业投诉分类方法及其应用研究[J]. 中国管理科学,2015,23(S1):188-192.
[14] Al Qady M, Kandil A. Automatic classification of project documents on the basis of text content[J]. American Society of Civil Engineers, 2015,29(3):04014043.
[15] 周茜, 赵明生, 扈旻. 中文文本分类中的特征选择研究[J]. 中文信息学报, 2004, 18(3):17-23.
[16] Salton G, Yang C, S A Wang A. A vector space model for automatic indexing. Communications of the ACM, 1975,18(11):613-620.
[17] Rocchio J J. Relevance feedback in information retrieval[M]//Salton G, The SMART retrieval system:Experiments in automatic document processing. Englewood cliffs, NJ:Practice-Hall,1971.
[18] Salton G, Buckley C. Term weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988,(5):24,513-523.
[19] 赵燕平,李超.网络安全信息挖掘中的特征选择与专利分析研究[J].中国管理科学,2004, 12(S1):514-518.
[20] Yang Yiming, Pedersen J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning San,Fransisco, July 08-12,1997.
[21] Lee C, Lee G G. Information gain and divergence-based feature selection for machine learning-based text categorization[J].Information Processing and Management,2006,42(1):155-165.
[22] 代六玲,黄河燕,陈肇雄. 中文文本分类中特征抽取方法的比较研究[J]. 中文信息学报, 2004, 18(1):26-32.
[23] Vapnik V. The nature of statistical learning theory[M]. Berlin Springer, 2000.
[24] Burges C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery,1998,2(2):121-167.
[25] 程显毅. 文本挖掘原理[M]. 北京:科学出版社, 2010.
[26] B Lantz. Machine learning with R[M]. Bejjing:China Machine Press,2015.
[27] 刘钢, 胡四泉, 范植华,等. 神经网络在文本分类上的一种应用[J]. 计算机工程与应用, 2003, (36):73-74.
[28] Dasarathy B V. Nearest neighbor (NN) norms:NN pattern classification techniques[M]. Los Alamitos:IEEE Computer Society Press, 1990.
[29] Weiss S M, Indurkhya,Zhang Tong. Fundamentals of predictive text mining[M]. Berlin:Springer,2012.