Design and implementation of diabetes detection model using machine learning

Authors

Johnson Kyanchat, Elisha

Issue Date

2024

Degree

Master of Science (MSc) in Data Analytics

Publisher

Dublin Business School

Rights

Abstract

This work introduces a diabetes prediction machine learning model creation and application process. This work focuses on the performance of machine learning models in predicting the risk of diabetes in individuals and identifying the most relevant factors associated with diabetes using the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset, which comprises a strong sample of 253,860 individuals and 21 health-related features. Two primary models—a Support Vector Machine (SVM) and an Artificial Neural Network (ANN)—as well as K-Nearest Neighbors (KNN), Logistic Regression, XGBoost—were also built for comparative study between aforementioned models. Handling missing variables and oversampling to solve class imbalance were part of the steps for training these models. The project sought to evaluate these models' diabetes prediction ability by means of accuracy, precision, recall, F1-score analysis, so establishing their efficacy. Emphasizing the need of variable analysis in improving model accuracy, the results support the continuous study in predictive analytics for treatment of chronic diseases.