Mental Illness Detection Using Natural Language Processing and Machine Learning

Authors

Rizon, Ruhit Ahmed

Issue Date

2024

Degree

MSc in Artificial Intelligence

Publisher

Dublin Business School

Rights

Abstract

Mental illness nowadays has a huge impact on public health that encourages new methods of early identification and intervention. This dissertation will explore the bridge between Natural Language Processing and Machine Learning methods. The dataset is taken from Zenodo that has 104 files and one million rows and reflects different mental health conditions. Altogether total rows were one million and one hundred thousand is taken randomly from them where each file contributes ten percent of the data. Preprocessing techniques were applied to improve the quality of the train set such as stopword removal, lemmatization, punctuation removal, special character and number removal, and data merging. For model training different feature engineering techniques were used such as TF-IDF, Min-Max scaling, Standard scaling and Log scaling. On the other hand, four different classifiers were used to evaluate the effectiveness of predicting mental diseases from text. They are Multinomial Naive Bayes, Logistic Regression, Random Forest and Gradient Boosting. Grid search and Random search were also be used to investigate the difference between the results of Logistic Regression and Multinomial Naive Bayes. To evaluate the performance of each model different techniques were used like accuracy, weighted precision, F1 scores, and confusion matrices. Logistic Regression was the best model which was min-max scaled.