Investigating the Role of SMOTE Sampling and LIME Interpretability in Enhancing Fraud Detection for E-commerce Platforms

No Thumbnail Available
Jose, Meethu
Issue Date
MSc in Data Analytics
Dublin Business School
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
For many years, researchers have been interested in the problem of fraud detection in the financial sector. Both Depending on the data availability and use cases, supervised and unsupervised algorithms are utilized to detect fraudulent transactions. For the supervised binary classification used in this work, detection of fraud using a real dataset from the e-commerce business Vesta. Since the dataset includes actual data, to maintain the data's secrecy, the majority of the characteristics are hidden. The models for machine learning Specifically, Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosted For detecting fraud, the classifier (XGB) and artificial neural network (ANN) are taken into consideration. This research centred on the two fundamental obstacles to fraud detection, model interpretability and skewed data. A sampling strategy has been utilized to balance the distribution of the target variable since the data is extremely skewed. The main goal of this study is to determine how well the oversampling approach SMOTE sampling can solve the problem of class imbalance that arises in fraud detection. Due to the fact that fraudulent transactions occur far less frequently than valid ones, the class imbalance problem frequently results in inferior model performance. To address this mismatch and improve the overall prediction power of fraud detection models, SMOTE builds synthetic instances of the minority class. This paper examines how SMOTE affects various machine learning algorithms, evaluating its capacity to enhance the identification of fraudulent transactions through thorough testing and performance assessment. Furthermore, this article discusses the interpretability of LIME-based fraud detection algorithms. Although machine learning models are highly predictive, their inner workings are sometimes confusing and opaque. LIME, an interpretable machine learning approach, attempts to fill this gap by providing explanations of model predictions that are simple enough for the average person to understand. By creating locally accurate explanations, LIME increases openness and accountability in the fraud detection process and allows stakeholders to understand the thinking behind the model's decision-making. In the context of fraud detection in the VERSTA dataset, the study assesses how LIME explains the complex interactions between features and outcomes. The effectiveness of models with and without SMOTE sampling was analysed in this study. Except for ANN, it has been seen that SMOTE 2 significantly enhances model performance. The second phase of the investigation involves the models' LIME interpretation. To identify the traits shared by these explanations, the feature significance of the models is compared to these LIME interpretations. The LIME interpretations have been criticized mostly for their consistency and stability. The validity of model-neutral explanations like LIME has been contested by several scholars. An unique strategy for combating fraud in the e-commerce space is presented by the combination of SMOTE sampling and LIME interpretability. The findings of this study shed light on SMOTE's effectiveness in dealing with class imbalance and its consequent impact on model performance. The paper also emphasizes how LIME enhances model interpretability, enabling stakeholders to make wise decisions in light of the model's insights. This paper makes a contribution to the advancement of fraud detection approaches in the complex and dynamic environment of e-commerce by tackling the combined concerns of accuracy and interpretability. In summary, this study provides a thorough analysis of how SMOTE sampling and LIME interpretability interact in the context of fraud detection. The paper highlights the potential of these strategies to strengthen the resistance of e-commerce systems against fraudulent activities through empirical validation on the VERSTA dataset, thereby enabling a more secure and reliable digital marketplace