Interpretable machine learning for customer churn prediction
Authors
Irmak, Deniz
Issue Date
2024
Degree
Master of Science in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
This study aims to develop and evaluate interpretable machine learning models for predicting customer churn in the telecommunications sector. The dataset, consisting of 7,043 customer records and 21 features, was preprocessed to handle missing values, encode categorical variables, and balance the target class using SMOTE. Five machine learning models were implemented: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Neural Network. The Gradient Boosting model emerged as the most effective, providing a balanced combination of accuracy and interpretability. Partial Dependence Plots (PDPs) and Local Interpretable Model-agnostic Explanations (LIME) were used to explain the model’s predictions, revealing that contract type, monthly charges, and online security services were significant predictors of churn. The results suggest that targeted interventions based on these factors could significantly reduce churn, thereby improving customer retention and business profitability.