Improvement of recall measure by deriving graph features for link prediction on machine learning algorithms

No Thumbnail Available
Bildik, Zeliha Bilge
Issue Date
MSc in Data Analytics
Dublin Business School
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
In recent years the volume of data has increased significantly creating new challenges and opportunities in dealing with the interconnected data. Although new technologies enable the processing of high volumes of information, it is still challenging to find the relationships within the data that realise the anticipated business value. Graph analysis is becoming increasingly important to find the insights from connected data and to leverage machine learning outcomes. This thesis presents graph analytics applied on the leading ACID compliant graph dbms Neo4j to derive the features to improve on the prediction of recommender algorithms. The research uses the Movielens dataset for benchmarking purposes. Python is used for building the data pipeline using embedded cypher and python machine learning libraries. The research demonstrates the effectiveness of link prediction as a method for derivation of the features for machine learning. The resultant improvements in recall are demonstrated.