Design and implementation of diabetes detection model using machine learning
Authors
Johnson Kyanchat, Elisha
Issue Date
2024
Degree
Master of Science (MSc) in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Abstract
This work introduces a diabetes prediction machine learning model creation and application process. This work focuses on the performance of machine learning models in predicting the risk of diabetes in individuals and identifying the most relevant factors associated with diabetes using the 2015 Behavioral Risk Factor Surveillance System (BRFSS) dataset, which comprises a strong sample of 253,860 individuals and 21 health-related features. Two primary models—a Support Vector Machine (SVM) and an Artificial Neural Network (ANN)—as well as K-Nearest Neighbors (KNN), Logistic Regression, XGBoost—were also built for comparative study between aforementioned models. Handling missing variables and oversampling to solve class imbalance were part of the steps for training these models. The project sought to evaluate these models' diabetes prediction ability by means of accuracy, precision, recall, F1-score analysis, so establishing their efficacy. Emphasizing the need of variable analysis in improving model accuracy, the results support the continuous study in predictive analytics for treatment of chronic diseases.
