Predictive analysis of YouTube trending videos using machine learning
Authors
Niture, Aakash Ashok
Issue Date
2021
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights holder
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
YouTube is a world-famous video sharing interactive platform which allows its users to rate, share, save, comment, and upload the content. Unlike popular videos which get number of likes and views by the time they are stated as popular, YouTube trending videos represents the content which is gaining viewership over a certain time period and has a potential to be popular. Despite their importance YouTube trending video’s analysis have not been a well-researched area yet. This research proposes to analyse interactive features to determine correlation and importance of variables for the trendiness of a video. Study focuses on how interactive video features helps a video trend on YouTube. Research is based on YouTube trending video’s viewership statistics of more than 40000 videos over a certain time period. Since trending video statistics consists of number of Views, Likes, Dislikes and Comment counts, the research performed Linear regression model of Machine Learning for predictive analysis of number of views for YouTube trending videos. In addition, the study performs a comparative analysis of a number of classification models namely Random Forest, SVM, Decision Tree, Logistic Regression and Gaussian Naïve Bayes, to determine which model suits better for predicting the number of days a video will take to get trending from its upload time and the number of days a video will trend on the trending list. Research achieved maximum accuracy of 62.53% for predicting YouTube’s trending video’s lifecycle. Cross Validation method have been used for statistical significance testing and the performance evaluation matrix has compared and determined the most useful classifiers. Furthermore, this research follows CRISP DM methodology design with correlational quantitative research method. Study will bring objectivity towards the popularity constraint of YouTube trending videos.