Yelp rating classification using connected graph feature extraction and feature importance in machine learning workflow

No Thumbnail Available
Authors
Shaikh, Aquib Hassan
Issue Date
2019
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
This thesis titled- “Yelp Rating Classification Using Connected Graph Feature Extraction and Feature Importance In Machine Learning Workflow” focused on Yelp’s Challenge Dataset Round 13, we analyze data about restaurants from Yelp, specifically the reviews, to classify the star-ratings of the restaurants based on the contents of the reviews. In this thesis, I focus on improving the ML workflow using graph algorithms: connected feature extraction and feature importance in classification. Graph-enhanced ML can help fill in that missing contextual information that is so important for better decisions. ML pipeline was build using a few classification algorithms and H2O AutoML: Automatic Machine Learning interface for automating the machine learning workflow. The results obtained reveal that connected graph features played an important role in enhanced machine learning workflow. H2O’s Stacked Ensemble best able to classify the yelp rating with use of business influential rating obtained from Page Rank graph algorithm.