Intergrating information extraction and graph analytics to investigate influential enties in a corpus

No Thumbnail Available
Ramphielo, Realeboha
Issue Date
Msc in Artificial Intelligence
Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.
The sheer rate at which data is generated effectively renders manual information extraction inefficient. Further, the rate and size of the data that is produced easily lead to information overload. However, the advances in Artificial Intelligence technologies, especially Natural Language Processing, make accurate automatic extraction possible. In order to handle this ever-evolving data, knowledge graphs offer a practical solution because they can handle data and its relationships. This research aims to investigate the most influential entities from a corpus by combining information extraction and graph analytics, with the BioRED dataset as the use case data. A custom SpaCy NER module was trained to extract entities from the documents and achieved an F1 score of 75.9188%. From the resulting knowledge graph, with the aid of graph data science centrality algorithms, the finding of the study indicates “GDM” as the most influential entity.