Exploring the space of topic modelling and topic coherence on short and long text corpora

No Thumbnail Available
Authors
Pathela, Chirag Kumar
Issue Date
2020
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
Topic Modelling, a discipline of Natural Language Processing, is widely prevalent and its application on social network communications has become essential in identifying key themes impacting society. In this dissertation titled- “Exploring the space of Topic Modelling and Topic Coherence on short and long text corpora” a comparative study of topic modelling algorithms is presented including LDA (Latent Dirichlet Allocation), LSA(Latent Semantic Analysis), NMF(Non Negative Matrix Factorization) ,BTM(Biterm Topic Modelling). Algorithms are applied on Zomato and Ovarian Cancer Tweets extracted from Twitter and on Amazon Food Reviews. Six robust performance metrics are used for comparative purposes using the online Palmetto tool. The results obtained reveal that all models have strong potential for topic modelling. BTM performed the best in detecting more coherent topics on short texts measured across the six coherence metrics, whereas LDA outperformed on long texts. NMF outperforms other algorithms in terms of execution time.