Deciphering Deception - Detecting Fake Review using NLP by analysis of stylistic, sentiment-based, and semantic features
Authors
Poojary, Karthik Krishna
Issue Date
2024
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Abstract
This study delves into the critical issue of identifying deceptive online reviews, a challenge increasingly prevalent in the digital marketplace. The study leverages a combination of Natural Language Processing (NLP) and Machine Learning (ML) techniques to differentiate between genuine and fraudulent reviews. The methodology encompasses stylistic analysis to assess language structure, sentiment analysis to evaluate emotional tone, and semantic analysis employing Word2Vec and Latent Dirichlet Allocation (LDA) to uncover latent topics. These components form the foundation for feature engineering for model training and evaluation.
A diverse range of machine learning models, including Random Forest, Logistic Regression, Gaussian and Multinomial Naive Bayes, Simple Neural Network, Gradient Boosted Trees, Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) with Long Short- Term Memory (LSTM), were comprehensively evaluated. The comparative analysis provides valuable insights into the performance characteristics of each model.
Notably, Logistic Regression and Simple Neural Network emerge as top contenders, presenting strong accuracy, precision, recall, and F1 score. This comparative study serves as a benchmark for future research in the domain, offering a clear understanding of the strengths and weaknesses of various machine learning approaches in addressing the deceptive online review problem, using the combination of stylistic, sentiment-based, and semantic analysis. This research not only advances the understanding of deceptive review detection but also offers a foundation for future explorations in the field of NLP and ML, aimed at enhancing the reliability and transparency of online consumer feedback.