Topic modelling and theme discovery on Aylien News articles during COVID-19
Authors
Pitale, Sagar Kishor
Issue Date
2020
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
Topic modelling is increasingly important for the analysis of large volumes of unlabelled data necessitating scanning a collection of documents and identifying keywords and language usage patterns. It is a technique of unsupervised machine learning that enables clustering of similar word groups and expressions under topics as well as analyse individual topic content. The negative impacts of the pandemic have been reflected in the news media. This research applies topic modelling to the COVID-19 news articles from AYLIEN to identify key themes in the large volume of COVID-19 news articles. Topic modelling algorithms applied and compared include LDA, NMF, LSI and HDP. LDA showed interpretable topics with better topic coherence and identification of underlying themes including worldwide spread; workplace activity impact; lockdown implications; medical supply shortages; social and sport knock-on effects; and disease statistics. Results of applying the different algorithms are presented.