Prototypic implementation and comparison of fraud detection algorithms based on methods of statistical analysis

This project elucidates the execution of machine learning algorithms for the purpose of credit card fraud detection. For this task the company Insider Technologies Limited have supplied real world data consisting of 12.7 million transactions and 18 feature columns. During the project 96 additional f...

Full description

Bibliographic Details
Main Author: Hätälä, Tomas Sebastian
Format: Dissertation (University of Nottingham only)
Language:English
Published: 2015
Subjects:
Online Access:https://eprints.nottingham.ac.uk/30805/
Description
Summary:This project elucidates the execution of machine learning algorithms for the purpose of credit card fraud detection. For this task the company Insider Technologies Limited have supplied real world data consisting of 12.7 million transactions and 18 feature columns. During the project 96 additional features have been added. The performances of five feature subsets were then evaluated using the classification algorithms Naïve Bayes, Logistic Regression, Support Vector Machines, k-Nearest Neighbours, Random Forest and Neural Networks. The feature subset containing only single transaction features hereby performed best. Moreover, the supplied data is highly imbalanced with only 335 transactions being marked as fraudulent. Therefore, different under- and oversampling algorithms have been applied to the single feature subset. The performances of the machine learning algorithms finally were evaluated using the sampled data. Most algorithms got a boost in performance when trained on the balanced data with Random Forest performing best, followed by Deep Learning. The datasets sampled using random undersampling, or a combination including random undersampling, hereby outperformed the others. It is concluded that this is because only random undersampling was able to achieve a complete class balance.