Abstract: We develop a state-of-the-art fraud prediction model using a machine learning approach. We demonstrate the value of combining domain knowledge and machine learning method in model building. We select our model input based on existing accounting theories, but we differ from prior accounting research by using raw accounting numbers rather than financial ratios. We employ one of the most powerful machine learning methods, ensemble learning, rather than the commonly used method of logistic regression. To assess the performance of fraud prediction models, we introduce a new performance evaluation metric commonly used in ranking problems that is more appropriate for the fraud prediction task. Starting with an identical set of theory-motivated raw accounting numbers, we show that our new fraud prediction model outperforms two benchmark models by a large margin: Dechow et al.’s logistic regression model based on financial ratios and Cecchini et al.’ssupport-vector-machine model with a financial kernel that maps raw accounting numbers into a broader set of ratios.
【摘要】:上市公司财务欺诈是一个世界性的难题。本文基于机器学习方法开发了一套全新的财务欺诈预测模型。我们展示了将领域知识和机器学习方法相结合在模型构建中的价值。虽然本文通过已有的会计理论来选择模型输入,但与以前的研究不同,本文采用原始会计数字而非广泛使用的财务比率。同时,本文采用了一种最强大的集成机器学习方法。为了评估预测模型的性能,本文引入了一套基于排序的绩效评估指标,该指标更适合于欺诈预测任务。实证结果显示,从相同的一组基于理论的原始会计数字开始,提出的欺诈预测模型优于两个基准模型,即基于财务比率的逻辑回归模型和采用金融核函数的支持向量机模型,后者的金融核函数用于将原始会计数字映射为比率。
Keywords: Fraud Prediction; Machine Learning; Ensemble Learning
【关键字】:欺诈预测;机器学习;集成学习
本文于2019年在线发表于Journal of Accounting Research。该期刊为学院A类奖励期刊,作者按姓氏字母排序。