Ensemble modeling & prediction interpretability for insurance fraud claims classification
Authors
Balasubramanian, Madhana Veerapandian
Issue Date
2019
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
The insurance fraud claims classification using Ensemble modeling is explained in this research paper. Using the pattern found in the data, Machine learning algorithms were able to find the fraud claims efficiently. The goal of this research is to carry out ensemble models like Gradient Boosting Machine, Random Forest and XGBOOST algorithms with sampling techniques and compare the results obtained with the traditional algorithms like SVM, Logistic Regression and Artificial Neural Networks. This research used data produced by Oracle and classifiers were trained on features selected after feature engineering with Boruta package in R, Chi-Square test, Sample T-test and evaluated with metrics such as Accuracy, ROC and F1 score. The result showed that ensemble methods meet high ROC score than traditional methods. XGBoost algorithm achieved highest AUC score of about 86.8% after over sampling using SMOTE on training data. Local Interpretable Model-Agnostic Explanations (LIME) package was used for model interpretability which gave a good insight on individual prediction. Also, a prediction API was developed using plumber package in R.