Comparative study of Latent Dirichlet allocation and Louvain modularity on topic extraction from Pharma News

Authors

de Limas, Lorran

Issue Date

2020

Degree

MSc in Data Analytics

Publisher

Dublin Business School

Rights

Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.

Abstract

This research compares the efficacy of topic extraction on news content using Latent Dirichlet Allocation (LDA), a traditional method of topic extraction based on Bayesian statistics versus Louvain modularity, a graph-based algorithmic approach applied to determine clusters of words representing topics. The research explores whether the Louvain graph-based approach better captures contextual information that is lost in the LDA bag-of-words approach. The raw data is pre-processed for both methods and for the Louvain method, graph analysis techniques are further applied prior to the execution of the Louvain algorithm. Several models are produced and evaluated using topic coherence scoring and compared against manual ‘eyeballed’ topic extraction. The results show that the Louvain graph-based algorithmic approach significantly increases the topic coherence score.