Comparing machine learning algorithms on credit card fraud problem

No Thumbnail Available
Fatih, Can
Issue Date
M.Sc. in Artificial Intelligence
Dublin Business School
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
The challenge of efficient fraud detection remains a top priority in the financial sector, especially given the potential consequences for both consumers and institutions. This paper investigates the use of machine learning models to detect fraudulent activities, utilizing a dataset containing transaction records. The study used a rigorous analysis methodology to compare a variety of algorithms, including Naive Bayes, Random Forest, Xgboost, and their respective Kfold variations. With an execution time of just over 4 minutes, Xgboost consistently emerged as a top performer, particularly in precision, recall, and F1 score metrics. The Kfold variant of Random Forest took the longest, while Naive Bayes was the quickest. When the results were weighted based on Recall and ROC AUC Score, Xgboost consolidated its position as the most capable model for detecting the vast majority of fraudulent activities. The paper also included detailed visual insights in the form of figures, which provided comparative performance metrics for the models. According to the findings, financial institutions should consider deploying the Xgboost algorithm as a critical component in their fraud detection systems, while also balancing considerations of execution time and the business ramifications of false positives and negatives.