主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

中国管理科学 ›› 2024, Vol. 32 ›› Issue (3): 1-8.doi: 10.16381/j.cnki.issn1003-207x.2021.2434

• •    下一篇

基于半监督支持向量机的信用评分模型

陈耸1,于秀运2,邱涌钦2,方匡南2()   

  1. 1.台州学院小微金融学院,浙江 台州 318000
    2.厦门大学经济学院,福建 厦门 361005
  • 收稿日期:2021-11-23 修回日期:2022-10-17 出版日期:2024-03-25 发布日期:2024-03-25
  • 通讯作者: 方匡南 E-mail:xmufkn@xmu.edu.cn
  • 基金资助:
    国家自然科学基金面上项目(72071169);教育部人文社会科学研究青年基金项目(20YJC910004);中央高校基本科研业务专项资金(20720231060)

Credit Scoring Based on Semi-supervised Support Vector Machine

Song Chen1,Xiuyun Yu2,Yongqin Qiu2,Kuangnan Fang2()   

  1. 1.Mico-Finance College, Taizhou University, Taizhou 318000, China
    2.School of Economics, Xiamen University, Xiamen 361005, China
  • Received:2021-11-23 Revised:2022-10-17 Online:2024-03-25 Published:2024-03-25
  • Contact: Kuangnan Fang E-mail:xmufkn@xmu.edu.cn

摘要:

针对信用评分中有标记样本获取难度大、成本高的问题,本文提出一种新的基于半监督支持向量机的信用评分模型。通过给未标记样本引入新的参数,使得模型无需满足随机缺失假设,具有良好的适用性。同时,在损失函数中加入半监督部分鼓励有标记样本和未标记样本系数的相似性,从而能够有效融合未标记样本信息,提升估计效果。此外,本文利用Group LASSO进行变量选择,可以充分利用组结构信息,筛选重要变量。通过数值模拟和一个信用卡风险违约预测实例数据证明了所提方法的可行性,以及在变量选择、系数估计和分类预测上的优良效果。

关键词: 半监督分类, 支持向量机, 变量选择, 信用评分

Abstract:

To address the problem of difficulty and high cost in obtaining labeled samples in credit scoring, a new credit scoring model is proposed based on semi-supervised support vector machines. By introducing new parameters to the unlabeled samples, the model need not satisfy the random missing assumption and has good applicability. Meanwhile, adding a semi-supervised part to the loss function encourages the similarity between the coefficients of labeled and unlabeled samples, which can effectively fuse the unlabeled sample information and improve the estimation effect. In addition, Group LASSO is used for variable selection, which can make full use of the group structure information and screen important variables. The feasibility of the proposed method and its excellent results in variable selection, coefficient estimation and classification prediction are demonstrated by numerical simulations and an example data of credit card risk default prediction.

Key words: semi-supervised classification, support vector machines, variable selection, credit scoring

中图分类号: