Water Quality Analysis Using Machine Learning

No Thumbnail Available
Vilas Khaire, Neelam
Issue Date
MSc in Business Analytics
Dublin Business School
Water quality evaluation is crucial in environmental management, and utilising machine learning models improves the accuracy of predictions. This study aims to compare different machine learning models for predicting water quality before and after the monsoon season in Telangana. The dataset used in this research was obtained from the Telangana Ground Water Department. The chosen models, namely Random Forest Classifier (RFC), Support Vector Classifier (SVC), Multi-layer Perceptron (MLP), Stochastic Gradient Descent (SGD), and KNeighborsClassifier, are assessed with a specific focus on imbalanced data using Principal Component Analysis (PCA) as the model was giving perfect score due to being imbalanced which was incomparable and incorrect. The effectiveness of the models is evaluated by employing essential performance metrics, including recall, precision, and F1 score as the accuracy does not work well with imbalanced data. The pre-monsoon results indicate that RFC performs exceptionally well, with a recall of 0.988 and precision of 0.900. The monsoon transition has had a noticeable effect on RFC, as it continues to perform exceptionally well in the post-monsoon period, with an improved recall rate of 0.996 and precision rate of 0.993. SVC, SGD and MLP demonstrate consistent and strong performance in both time periods, demonstrating their ability to adapt. Notably, the KNeighborsClassifier demonstrates enhancement after the monsoon season, highlighting its sensitivity to seasonal changes. The analysis of seasonal variations was performed with help of T-test on the machine learning model performance’s. RFC demonstrates consistent excellence. The comparative analysis enhances the scientific comprehension of machine learning models in predicting water quality, providing practical implications for environmental scientists, policymakers, and stakeholders involved in water resource management.