Interpretable machine learning for customer churn prediction

Authors

Irmak, Deniz

Issue Date

2024

Degree

Master of Science in Data Analytics

Publisher

Dublin Business School

Rights

Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.

Abstract

This study aims to develop and evaluate interpretable machine learning models for predicting customer churn in the telecommunications sector. The dataset, consisting of 7,043 customer records and 21 features, was preprocessed to handle missing values, encode categorical variables, and balance the target class using SMOTE. Five machine learning models were implemented: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Neural Network. The Gradient Boosting model emerged as the most effective, providing a balanced combination of accuracy and interpretability. Partial Dependence Plots (PDPs) and Local Interpretable Model-agnostic Explanations (LIME) were used to explain the model’s predictions, revealing that contract type, monthly charges, and online security services were significant predictors of churn. The results suggest that targeted interventions based on these factors could significantly reduce churn, thereby improving customer retention and business profitability.