Voluntary Carbon Reporting Prediction: A Machine Learning Approach

In this paper we address the impact of the introduction of the National Greenhouse and Energy Reporting scheme on corporate carbon reporting, and subsequently identify factors that influence the level of voluntary carbon reporting. A review of the literature demonstrates a large number of potential factors have been previously deployed to explain voluntary reporting practices; however, the analytical and empirical methods widely used in the literature have limiting statistical assumptions and confine analysis to a small number of explanatory factors. To address this limitation in prior research we apply advanced machine learning methods, such as gradient boosting machines and random forests, to identify predictive variables through analytical means. We compare the performance of machine learning methods with traditional methods such as logistic regression. We find that machine learning methods significantly outperform logistic regression and provide fundamentally different interpretations of the role and influence of different predictive variables on voluntary carbon reporting. While most variables were not statistically significant in the logit results, a number of key proxies for financial performance, corporate governance, and corporate social responsibility have out-of-sample predictive power of the level of voluntary carbon reporting in the machine learning analysis.