Subject Code & Title: ACCT3015 Developing A Machine Learning Pipeline Assessment
Assignment Type: Individual
INSTRUCTIONS:
1.Consider the corporate bankruptcy prediction problem, and the list of variables in Appendix A. These variables are currently used to predict the probability of corporate failure for a firm, using annual financial and market data. Recommend additional variables, which are likely to be associated with the prediction of corporate bankruptcy and hence likely to improve the predictive power of the machine learning models. You should outline new variables that can be created from the existing variables (such as financial ratios), as well as external data sources you can collect from different Big Data sources. Justify why you believe they would be appropriate explanatory variables.
ACCT3015 Developing A Machine Learning Pipeline For Predicting Bankruptcy Assessment – Australia.
2.Recommend and discuss methods of exploratory data analysis that you would undertake to understand and visualize the data. Explain and motivate your choices.
3.Provide a summary analysis of the predictive performance of the CART model outlined in Appendix B, and an analysis of which variables contributed most to the model’s predictive power. Interpret the CART decision tree.
4.Compare results of the gradient boosting model outlined in Appendix C. Describe and explain any difference in model performance. Identify the role and impact of predictor variables in the CART and gradient boosting models.
5.Outline how the model/(s) can be embedded into a business or accounting process, to facilitate or inform decision-making. You may choose any appropriate business or accounting process.
Appendix A: Covariates included in models
Appendix B: CART Model Results
Best Model over 5-fold CV
Decision Tree Plot
Please zoom in for further details, to interpret the tree.
Variable Importance:
Confusion Matrix – Test Set
Model Performance Measures – Training and Test Set
ROC Curve
Appendix C: Gradient Boosted Tress Model Results
Model Summary: Model error measures
Variable Importance
Variable Importance
Summary for 89 Trees – Gains Chart – ROC, Sample: Full sample, Target class: 1
Confusion Matrix -Test