Credit Card Fraud Detection as a Classification Problem

Credit Card Fraud Detection as a Classification Problem

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.


Each project comes with 2-5 hours of micro-videos explaining the solution.

Code & Dataset

Get access to 50+ solved projects with iPython notebooks and datasets.

Project Experience

Add project experience to your Linkedin/Github profiles.

Customer Love

Read All Reviews

Swati Patra

Systems Advisor , IBM

I have 11 years of experience and work with IBM. My domain is Travel, Hospitality and Banking - both sectors process lots of data. The way the projects were set up and the mentors' explanation was... Read More

Mike Vogt

Information Architect at Bank of America

I have had a very positive experience. The platform is very rich in resources, and the expert was thoroughly knowledgeable on the subject matter - real world hands-on experience. I wish I had this... Read More

What will you learn

Understanding the problem
Importing required libraries and understanding their use
Importing data and learning its structure
Performing basic EDA
Scaling different variables
Outlier treatment
Building basic Classification model with Random Forest
Nearmiss technique for undersampling data
SMOTE for oversampling data
cross validation in the context of undersampling and oversampling
Pipelining with sklearn/imblearn
Applying Linear model: Logistic Regression
Applying Ensemble technique: Random Forest
Applying Non Linear Algorithms: Support Vector Machine, Decision Tree and k-Nearest Neighbour
Making predictions on test set and computing validation metrics
ROC curve and Learning curve
Comparison of results and Model Selection
Visualization with seaborn and matplotlib

Project Description

It is vital that credit card companies are able to identify fraudulent credit card transactions so that customers are not charged for items that they did not purchase. The dataset used contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset has been collected and analyzed during a research collaboration of Worldline and the Machine Learning Group ( of ULB (Universite Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on and

As the dataset was created using a PCA, preprocessing of data is of little scope in this problem. The imbalance between classes is compensated using oversampling and undersampling. The logistic regression, random forest, support vector machine, k-means are used, within a cross-validation framework. Lastly the recall and accuracy are considered as metrics while choosing the best classifier. A buffer section on outlier detection is added at the end.

Similar Projects

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

In this project, we are going to talk about insurance forecast by using regression techniques.

Curriculum For This Mini Project

Business Problem
Data Science Problem
Solution Workflow
Show me the Data
Exploratory Data Analysis - Part 1
Exploratory Data Analysis - Part 2
Data Preparation - Part 1
Data Preparation - Part 2
Validation Metrics
Base Model
Undersampling Models - Part 1
Undersampling Models - Part 2
Oversampling Models
Best Model