Comparative study of Latent Dirichlet allocation and Louvain modularity on topic extraction from Pharma News
Authors
de Limas, Lorran
Issue Date
2020
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
This research compares the efficacy of topic extraction on news content using Latent Dirichlet Allocation (LDA), a traditional method of topic extraction based on Bayesian statistics versus Louvain modularity, a graph-based algorithmic approach applied to determine clusters of words representing topics. The research explores whether the Louvain graph-based approach better captures contextual information that is lost in the LDA bag-of-words approach. The raw data is pre-processed for both methods and for the Louvain method, graph analysis techniques are further applied prior to the execution of the Louvain algorithm. Several models are produced and evaluated using topic coherence scoring and compared against manual ‘eyeballed’ topic extraction. The results show that the Louvain graph-based algorithmic approach significantly increases the topic coherence score.