Detection of Hindi spam emails using NLP

dc.contributor.advisorAfzal, S. A.
dc.contributor.authorDhamale, Shraddha Parshuram
dc.date.accessioned2023-12-18T15:07:50Z
dc.date.available2023-12-18T15:07:50Z
dc.date.issued2023-08
dc.description.abstractIn modern times, the business and education sectors embrace email for collaboration and interaction. Email is a fast and easy means of communicating for both quick and prolonged periods of time. Emails is growing into an effective way of exchanging information, which results in unsolicited bulk or spam. Such emails harvest sensitive information from individuals or business-related facts, as well as cultivate pornographic material or marketing services. Since the Hindi and English languages are so dissimilar, detecting spam emails in Hindi is challenging. These tactics are broadly characterized as contextbased or non-context-based. We analyzed and assessed many research materials in this paper. Previous research papers’ findings assist in the development of spam detection algorithms for a variety of platforms, including social media, email, and text messaging. This project aims to increase the precision and efficacy of spam identification in order to improve user experiences, defend users from potential threats or malicious activities, and keep online communication channels safe. Researchers have widely employed Natural Language Processing (NLP) techniques to detect spam emails in the English language during the previous five years. These methods attempt to analyze the textual content of emails in order to identify components that can discriminate between legitimate and spam messages. The aim of this research is to develop an efficient system for identifying and filtering spam emails in Hindi using similar techniques. It is necessary to have a reliable Hindi spam detection system. At least research available in the Hindi language was a major challenge in this field. We proposed a system that reliably detects Hindi spam emails using NLP. We analyzed and studied multiple machine learning techniques such as Logistic Regression, Random Forest, Decision Tree, Naive Bayes, and Support Vector Classifier. Ultimately choosing logistic regression to construct the system. The system provides an average accuracy of 97.72% by implementing the K-fold Cross Validation technique
dc.identifier.citationDhamale, S, P.(2023).Detection of Hindi spam emails using NLP. Masters Thesis, Dublin Business School.
dc.identifier.urihttps://hdl.handle.net/10788/4404
dc.language.isoen
dc.publisherDublin Business School
dc.rightsItems in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
dc.rights.holderCopyright: The author
dc.rights.urihttp://esource.dbs.ie/copyright
dc.subjectNatural Language Processing
dc.subjectAlgorithms
dc.subjectArtificial intelligence
dc.titleDetection of Hindi spam emails using NLP
dc.typeThesis
dc.type.degreelevelMsc
dc.type.degreenameMsc in Business Analytics
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
msc_Dhamale_s_p_ 2023.pdf
Size:
1.66 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed upon to submission
Description: