Topic modelling and theme discovery on Aylien News articles during COVID-19

Authors

Pitale, Sagar Kishor

Issue Date

2020

Degree

MSc in Data Analytics

Publisher

Dublin Business School

Rights

Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.

Abstract

Topic modelling is increasingly important for the analysis of large volumes of unlabelled data necessitating scanning a collection of documents and identifying keywords and language usage patterns. It is a technique of unsupervised machine learning that enables clustering of similar word groups and expressions under topics as well as analyse individual topic content. The negative impacts of the pandemic have been reflected in the news media. This research applies topic modelling to the COVID-19 news articles from AYLIEN to identify key themes in the large volume of COVID-19 news articles. Topic modelling algorithms applied and compared include LDA, NMF, LSI and HDP. LDA showed interpretable topics with better topic coherence and identification of underlying themes including worldwide spread; workplace activity impact; lockdown implications; medical supply shortages; social and sport knock-on effects; and disease statistics. Results of applying the different algorithms are presented.