Credit Card Anomaly Detection using Autoencoders

Credit Card Anomaly Detection using Autoencoders

In this Deep Learning Project, you will use the credit card fraud detection dataset to apply Anomaly Detection with Autoencoders to detect fraud.
explanation image

Videos

Each project comes with 2-5 hours of micro-videos explaining the solution.

ipython image

Code & Dataset

Get access to 102+ solved projects with iPython notebooks and datasets.

project experience

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

What will you learn

Understanding the problem statement
Importing the dataset and importing libraries
Performing basic EDA
Data cleaning Imputing the null values filling them using the appropriate method
Using ggplot to visualize the Dataset
Importing h2o library and initializing an h2o cluster
Splitting Dataset into Train and Test
Defining parameters for training a Neural Network
Training the neural network
Understanding what is difference between Artificial Neural Networks and Autoencoders
How does an Autoencoder work
Loading the pre-trained Neural Network
How to Autoencode a pre-trained Neural Networks
Visualizing the effectiveness of an Autoencoded model and a Neural Networks using ggplot
Making predictions using the trained model

Project Description

What is anomaly detection?

 

Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance, a change in consumer behavior.

 

 

Applications of Anomaly detection

 

  • Banking, Financial Services, and Insurance (BFSI) – In the banking sector, some of the use cases for anomaly detection are to flag abnormally high transactions, fraudulent activity, and phishing attacks.

  • Retail – In Retail, anomaly detection is used for processing large volumes of financial transactions to identify fraudulent behaviors, such as identity theft and fraudulent credit card usage.

  • Manufacturing – In Manufacturing, anomaly detection can be used in several important ways, such as identifying machines and tools that are underperforming, which can take months to find without anomaly detection technology.

  • IT and Telecom – In IT and Telecommunications, anomaly detection is increasingly valuable to detect and act on personal threats to users, financial threats to service providers, or other types of unexpected threats.

  • Defense and Government – In the Defence and Government setting, anomaly detection is best used for identifying excessive and fraudulent government spending, budgeting, and audits. This can save governments an immense amount of money.

  • Healthcare – In Health Care, anomaly detection is used for its application in a crucial management task that can improve the quality of the health services and avoid loss of huge amounts of money. In terms of identifying fraudulent claims from hospitals and on the side of the insurance providers.

 

 

Tech Stack 

 

  • Language used: R

  • Machine Learning interface: H2O

  • Other packages used: caret, e1071, ROCR, and many more

 

Dataset Overview

 

In this project, we will be using a credit card fraud dataset that represents fraudulent and legal transactions over a certain period. The data is available in a .csv format. In the dataset, we can see that most of the column names (V1 to V28)  are not mentioned explicitly. This is because PCA (Principal Component Analysis)  transformation has been performed on the original dataset to maintain the confidentiality of the data. Apart from these variables, we have a few explicit variables as follows

  1. Time - Difference in seconds between each transaction and its previous transaction

  2. Amount - Transaction Amount

  3. Class 

    1. 0 - Non-fraudulent transaction

    2. 1 - Fraudulent Transaction

 

 

Approach

 

  1. Business context and objective 

  2. Translating into Data Science approach

    1. What, why, where Anomaly Detection?

    2. Why we are using a fraud dataset for this problem

    3. Algorithms used to solve this problem

  3. Data importing and Data Understanding

  4. Data Preprocessing 

    1.  Creating time variable

  5. EDA

  6. Preparing data for modelling

  7. Understanding neural networks and deep neural networks

  8. Understanding Autoencoders 

  9. Unsupervised Learning using h2o

    1. Building model and Model Details

  10. Evaluation parameters understanding

    1. Evaluating based on Reconstructed MSE

  11. Supervised Learning using h2o 

    1. Building and tuning supervised learning model using H2O

  12. Transfer learning 

    1. Supervised Learning using Pretrained model and evaluation

  13. Precision-recall curve

    1. Try different thresholds to improve accuracy

  14. Making production-ready code

New Projects

Curriculum For This Mini Project

Introduction to Anomaly Detection
03m
What, why, where anomaly detection?
11m
Business Context
04m
Data importing and data understanding
07m
Data preprocessing and EDA
05m
EDA continued
07m
Data preparation for model building
05m
Understanding Neural Networks
10m
Understanding Deep Neural networks
03m
Understanding Autoencoders
07m
Applications of Autoencoders
08m
Understanding the Evaluation parameters
05m
Building supervised learning model using H2O part-1
09m
Building supervised learning model using H2O part-2
04m
Precision and recall curve
04m
Building unsupervised learning model using H2O
05m
Model evaluation using reconstructed MSE
09m
Different thresholds to improve specificity
04m
Conclusion
04m
Modular code overview
07m

Latest Blogs