Detecting Bank Account Opening Fraud Using Machine Learning

No Thumbnail Available
Uwaoma, Chukwuebuka
Issue Date
MSc in Data Analytics
Dublin Business School
This research explores machine learning techniques for detecting fraudulent bank account openings using a recently published large-scale benchmark dataset. Multiple classification algorithms are evaluated, including Logistic Regression, Decision Trees, Random Forests, and LightGBM. Following standard data preprocessing and feature engineering, these models are trained on an imbalanced dataset of one million account applications over eight months. Adaptive synthetic oversampling is utilized to mitigate the extreme rarity of positive fraud cases. Various performance metrics assess model accuracy, real-world feasibility constraints, and fairness across protected demographic groups. Initial results indicate LightGBM achieved the best overall recall of 62%, capturing most fraudulent instances. However, enforcing a 5% false positive rate threshold is necessary for practical usage but severely impacts recall. Predictive equality analysis also exposes some algorithms that inadvertently introduce bias against seniors. These findings indicate that combining sampling techniques with gradient boosting methods has the potential to balance performance, operational constraints, and ethical considerations in identifying criminal account openings. More hyperparameter tuning and model ensembling could enhance performance. This study establishes a rigorous methodology and a benchmark for future research into machine learning for fraud detection in the banking sector.