Subject Code & Title: ACCT3015 Developing A Machine Learning Pipeline Assessment
Assignment Type: Individual
INSTRUCTIONS:
1.Consider the corporate bankruptcy prediction problem, and the list of variables in Appendix A. These variables are currently used to predict the probability of corporate failure for a firm, using annual financial and market data. Recommend additional variables, which are likely to be associated with the prediction of corporate bankruptcy and hence likely to improve the predictive power of the machine learning models. You should outline new variables that can be created from the existing variables (such as financial ratios), as well as external data sources you can collect from different Big Data sources. Justify why you believe they would be appropriate explanatory variables.
![ACCT3015 Developing A Machine Learning Pipeline Assessment – Australia.](https://assignmentfirm.com/wp-content/uploads/2021/12/Developing-A-Machine-Learning-Pipeline.jpg)
ACCT3015 Developing A Machine Learning Pipeline For Predicting Bankruptcy Assessment – Australia.
2.Recommend and discuss methods of exploratory data analysis that you would undertake to understand and visualize the data. Explain and motivate your choices.
3.Provide a summary analysis of the predictive performance of the CART model outlined in Appendix B, and an analysis of which variables contributed most to the model’s predictive power. Interpret the CART decision tree.
4.Compare results of the gradient boosting model outlined in Appendix C. Describe and explain any difference in model performance. Identify the role and impact of predictor variables in the CART and gradient boosting models.
5.Outline how the model/(s) can be embedded into a business or accounting process, to facilitate or inform decision-making. You may choose any appropriate business or accounting process.
Appendix A: Covariates included in models
![ACCT3015 Developing A Machine Learning Pipeline For Predicting Bankruptcy Assessment - Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-874.png)
![ACCT3015 Developing A Machine Learning Pipeline Assessment – Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-875.png)
Appendix B: CART Model Results
Best Model over 5-fold CV
![ACCT3015 Developing A Machine Learning Pipeline For Predicting Bankruptcy Assessment - Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-877.png)
Decision Tree Plot
Please zoom in for further details, to interpret the tree.
![ACCT3015 Developing A Machine Learning Pipeline For Predicting Bankruptcy Assessment - Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-878.png)
Variable Importance:
![ACCT3015 Developing A Machine Learning Pipeline Assessment – Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-879.png)
Confusion Matrix – Test Set
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-880.png)
Model Performance Measures – Training and Test Set
![ACCT3015 Developing A Machine Learning Pipeline Assessment – Australia.](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-881.png)
ROC Curve
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-882.png)
Appendix C: Gradient Boosted Tress Model Results
Model Summary: Model error measures
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-883.png)
Variable Importance
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-884.png)
Variable Importance
Summary for 89 Trees – Gains Chart – ROC, Sample: Full sample, Target class: 1
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-885.png)
Confusion Matrix -Test
![](https://www.excellentassignmenthelp.com.au/wp-content/uploads/image-886.png)