Comparative Analysis of Loan Prediction Models with Imbalanced Data and Impact of Loan Eligibility Metrics

No Thumbnail Available
Philip, Jeremiah
Issue Date
MSc in Financial Analytics
Dublin Business School
Loan prediction models plays a vital role in determining borrowers likelihood of defaulting on loans, but their development is challenging when dealing with imbalanced datasets. This research investigated the impact of including loan eligibility metrics on the performance of balanced loan default prediction models. Two machine learning models, Decision Tree and Random Forest, were compared in handling imbalanced data. To address data imbalance, Synthetic Minority Oversampling Technique (SMOTE), Under Sampling, and Random Over Sampling were used. The study validates the proposed methodology using a dataset from Kaggle. The findings revealed that incorporating loan eligibility metrics significantly improves the accuracy of balanced loan default prediction models. Among the models, Random Forest stands out, achieved the highest accuracy of 93.67%. This research contributes to financial analytics and data science, offering an optimized loan prediction model that empowers banks to enhance their loan decision-making process and effectively manage credit risk.