主管:中国科学院
主办:中国优选法统筹法与经济数学研究会
   中国科学院科技战略咨询研究院

   

Financial statement fraud identification model for listed companies based on multimodal graph representation deep learning

Gang - Wang   

  1. , 230009,
  • Received:2024-09-30 Revised:2025-09-10 Accepted:2025-09-11
  • Contact: Wang, - Gang

Abstract: In recent years, financial statement fraud has emerged as a growing concern, with the Association of Certified Fraud Examiners (ACFE) report underscoring the significant economic losses despite the low incidence of such fraud. This issue highlights the urgent need to build an accurate and efficient model for financial statement fraud detection (FSFD). With the advent of unstructured data and high-dimensional features, researchers are increasingly turning to using multimodal deep learning methods for FSFD. However, existing studies often simply extract and fuse financial and textual modalities, without fully exploiting the complex feature interactions within and between modalities for FSFD. To this end, this study proposes DLM_MGR, a deep learning method based on multimodal graph representation, aimed at exploring the feature interactions within and between modalities to enhance FSFD. The DLM_MGR method proposed in this study comprises three key steps. First, different levels of financial features are extracted using Stacked Autoencoders (SAE), and the feature interactions within the financial modality is captured through graph neural networks. This step thoroughly excavates the feature interactions among different levels of financial features within the financial modality, deepening the understanding of the company's financial condition. Second, local semantic features of annual reports are extracted using Word2vec, and the feature interactions within the textual modality is captured through graph neural networks. This step effectively mines the feature interactions among local semantic features within the textual modality, enhancing the comprehension of the text in the annual reports. Finally, a novel gating mechanism is designed based on element-wise average pooling and max pooling layers to capture modality-shared features and modality-specific features between the financial and textual modalities. This step fully explores the feature interactions between the financial and textual modalities, preventing information imbalance between the modalities and thereby improving the effectiveness of FSFD. To verify the effectiveness of DLM_MGR, this study conducted experiments using data from A-share listed companies spanning from 2015 to 2022. Eight evaluation metrics were employed to comprehensively assess the model’s performance from multiple dimensions, including Accuracy, Precision, Recall, Area Under the ROC Curve (AUC), F1-score, F2-score, Type I Error Rate, and Type II Error Rate. The results demonstrated that the proposed method outperformed other benchmark methods, achieving optimal results on the most metrics across different modalities. To summarize, the DLM-MGR method significantly improves the accuracy and efficiency of FSFD by exploiting the feature interactions within and between financial and textual modalities. The experimental results affirm its high recognition accuracy and computational efficiency, making it feasible for practical applications. This research holds significant theoretical and practical value. Theoretically, it enriches the literature on multimodal deep learning in the field of FSFD, providing an analytical framework in terms of shared features and specific features for other research paradigms. Practically, it offers a more effective method to detect financial statement fraud, enabling timely and accurate risk warnings, thereby promoting the healthy development of capital markets. Future research could explore incorporating external company information, such as social media data, and investigate deep clustering methods, such as deep embedded clustering (DEC), to further enhance the effectiveness of FSFD.

Key words: financial statement fraud, multimodal deep learning, graph neural network