Implementation of clustering algorithm using graph embeddings and graph data science on Yelp restaurant dataset

No Thumbnail Available
Authors
Karangutkar, Sayali
Issue Date
2020
Degree
MSc in Data Analytics
Publisher
Dublin Business School
Rights
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
Abstract
This research uses the leading property graph DBMS, Neo4j to implement a Restaurant Knowledge Graph of the Yelp Dataset (Challenge 2020 – business; users; category; reviews). The application of CYPHER queries; graph algorithms for insight and graph embeddings for machine learning on the graph are presented. Recently released (April 2020) Version 1.3 of the Neo4j Graph Data Science library on Neo4j 4.1.0 is explored using the Python library Py2Neo. Use cases for the graph algorithms PageRank and Overlap Similarity are presented. It is shown that using Py2neo library, data can be prepared for the application of machine learning algorithms in Python. A graph embedding algorithm (Node2vec) is applied for clustering using a traditional k-Means clustering algorithm using Tableau. The results are visualized in Tableau.