Real time fraud detection using streaming batches & implementation of a real time data warehouse subtitle: a combined approach to machine learning & data storage

Authors

Dangi, Neeharika

Issue Date

2020

Degree

MSc in Data Analytics

Publisher

Dublin Business School

Rights

Items in eSource are protected by copyright. Previously published items are made available in accordance with the copyright policy of the publisher/copyright holder.

Abstract

Anomaly detection is becoming increasingly more important in sectors like banking, medicine, computer networks and many more. The volume of online transactions is increasing exponentially, and credit card online transactions represent the maximum share. Therefore, financial organizations are increasingly focused on applications for real-time, online fraud detection. In the case of real-time data, outlier detection is considered challenging. In this dissertation, a novel technique combining anomaly detection of streaming data in batches and the implementation of a RTDW (Real Time Data warehouse) for high-volume online processing system has been proposed. Well-known anomaly detection algorithms such as Isolation Forest, LOF and OCSVM have been implemented and compared based on AUROC accuracy scores. The RTDW has been implemented on Oracle 11g. Oracle GoldenGate is configured to bring latency down to 0.4 seconds. Isolation Forest detects the maximum anomalous behaviour on the real time dataset achieving the best accuracy score of 0.8022.