Credit Card Fraud Detection as a Classification Problem

Credit Card Fraud Detection as a Classification Problem

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Ray Han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More


Lead Consultant, ITC Infotech

The project orientation is very much unique and it helps to understand the real time scenarios most of the industries are dealing with. And there is no limit, one can go through as many projects... Read More

What will you learn

Understanding the problem
Importing required libraries and understanding their use
Importing data and learning its structure
Performing basic EDA
Scaling different variables
Outlier treatment
Building basic Classification model with Random Forest
Nearmiss technique for undersampling data
SMOTE for oversampling data
cross validation in the context of undersampling and oversampling
Pipelining with sklearn/imblearn
Applying Linear model: Logistic Regression
Applying Ensemble technique: Random Forest
Applying Non Linear Algorithms: Support Vector Machine, Decision Tree and k-Nearest Neighbour
Making predictions on test set and computing validation metrics
ROC curve and Learning curve
Comparison of results and Model Selection
Visualization with seaborn and matplotlib

Project Description

It is vital that credit card companies are able to identify fraudulent credit card transactions so that customers are not charged for items that they did not purchase. The dataset used contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset has been collected and analyzed during a research collaboration of Worldline and the Machine Learning Group ( of ULB (Universite Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on and

As the dataset was created using a PCA, preprocessing of data is of little scope in this problem. The imbalance between classes is compensated using oversampling and undersampling. The logistic regression, random forest, support vector machine, k-means are used, within a cross-validation framework. Lastly the recall and accuracy are considered as metrics while choosing the best classifier. A buffer section on outlier detection is added at the end.

Similar Projects

In this data science project, we will look at few examples where we can apply various time series forecasting techniques.

Deep Learning Project using Keras Deep Learning Library to predict the effect of Genetic Variants to enable personalized Medicine.

In this machine learning project, we will build a predictive model to find out the sales of each product at a particular store.

Curriculum For This Mini Project

Business Problem
Data Science Problem
Solution Workflow
Show me the Data
Exploratory Data Analysis - Part 1
Exploratory Data Analysis - Part 2
Data Preparation - Part 1
Data Preparation - Part 2
Validation Metrics
Base Model
Undersampling Models - Part 1
Undersampling Models - Part 2
Oversampling Models
Best Model