Implementation of clustering algorithm using graph embeddings and graph data science on Yelp restaurant dataset

Authors

Karangutkar, Sayali

Issue Date

2020

Degree

MSc in Data Analytics

Publisher

Dublin Business School

Rights

Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.

Abstract

This research uses the leading property graph DBMS, Neo4j to implement a Restaurant Knowledge Graph of the Yelp Dataset (Challenge 2020 – business; users; category; reviews). The application of CYPHER queries; graph algorithms for insight and graph embeddings for machine learning on the graph are presented. Recently released (April 2020) Version 1.3 of the Neo4j Graph Data Science library on Neo4j 4.1.0 is explored using the Python library Py2Neo. Use cases for the graph algorithms PageRank and Overlap Similarity are presented. It is shown that using Py2neo library, data can be prepared for the application of machine learning algorithms in Python. A graph embedding algorithm (Node2vec) is applied for clustering using a traditional k-Means clustering algorithm using Tableau. The results are visualized in Tableau.