Comparative study of traditional vs. transformer machine learning algorithms for Inflammatory Bowel Disease (IBD) Medical Report Classification

Authors

Alshakhs, Baqer

Issue Date

2024

Degree

Master of Science (MSc) in Data Analytics

Publisher

Dublin Business School

Rights

Abstract

Inflammatory Bowel Disease (IBD) is a serious chronic condition that affects millions worldwide, yet it remains under-researched, especially in the application of machine learning. This research aims to draw attention to IBD and support the efforts of My Chron's and Colitis Angel (MyCCAngel), a non-profit organization, by developing a medical report classification model to determine whether users of their platform have IBD. The study compares traditional machine learning models, including Multi-layer Perceptron, Support Vector Machines, and Naïve Bayes, with a transformer-based model, Bidirectional Encoder Representations from Transformers (BERT). MyCCAngel offers a social platform specifically designed for IBD patients, providing them with tools and assistance to manage their daily challenges. Transformer models, such as BERT, represent a recent advancement in machine learning, applying an evolving set of mathematical techniques (Merritt, 2022). This research seeks to evaluate whether these newer models outperform traditional methods in the classification of medical reports. In this study, a quantitative research approach is employed, relying on data collected from a substantial sample size. This method allows for the identification of patterns and trends within the data, providing a more scalable and objective analysis. By leveraging a dataset, the study aims to draw conclusions that are applicable beyond the immediate sample, enhancing the reliability and applicability of the findings. In this study, a quantitative research approach is employed, involving the manual collection of 110 medical reports, both IBD and non-IBD, due to the unavailability of an existing dataset. This method allows for the identification of patterns and trends within the data, providing a more scalable and objective analysis. By using dataset, the study aims to draw conclusions that are applicable. The performance of each model was evaluated based on metrics such as accuracy, precision, recall, and F1-score. The findings indicate that traditional machine learning algorithms, particularly Naïve Bayes, outperform the transformer-based model BERT, achieving an accuracy of 91% compared to BERT's 68%. This study demonstrates that transformer models are not always superior and that traditional simple models like Naïve Bayes can offer better performance in specific tasks, such as IBD medical report classification. Furthermore, this research is the first to focus on the classification of IBD medical reports, providing valuable insights for future binary classification tasks in the medical field.