Predicting accounting fraud using imbalanced ensemble learning classifiers – evidence from China

Abstract

The current research aims to launch effective accounting fraud detection models using imbalanced ensemble learning algorithms for China A-Share listed firms. Based on a sample of 33,544 Chinese firm-year instances from 1998 to 2017, this research respectively established one logistic regression and four ensemble learning classifiers (AdaBoost, XGBoost, CUSBoost, and RUSBoost) by 12 financial ratios and 28 raw financial data. Additionally, we divided the sample into the train and test observations to evaluate the classifiers' out-of-sample performance. In detail, we applied two metrics, namely, Area under the ROC (receiver operating characteristic) curve (AUC) and Area under the Precision-Recall curve (AUPR), to evaluate classifiers' discriminability. In the supplement test, this study put forward an algebraic fused model on the basis of the four ensemble learning classifiers and introduced the sliding window technique. The empirical results showed that the ensemble learning classifiers can detect accounting fraud for the imbalanced China A-listed firms far more effectively than the logistic regression model. Moreover, imbalanced ensemble learning classifiers (CUSBoost and RUSBoost) effectively performed better than the common ensemble learning models (AdaBoost and XGBoost) in average. The algebraic fused model in the supplement test also obtained the highest average AUC and AUPR among all the employed algorithms. Our results offer firm support for the potential role of Machine Learning (ML)-based Artificial Intelligence (AI) approaches in reliably predicting accounting fraud with high accuracy. Similarly, for the Chinese settings, our ML-based AI offers utmost advantage in forecasting accounting fraud. Finally, this paper fills the research gap on the applications of imbalanced ensemble learning in accounting fraud detection for Chinese listed firms.