Data Science Project-TalkingData AdTracking Fraud Detection

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.


What will you learn

Understanding the problem statement
Importing a training dataset and testing
Installing necessary libraries and understanding its use
Performing basic EDA and checking for null values
Timestamping the necessary columns
Checking for unique values and data types
Checking for the relationship between different variables
Visualizing the distribution through ggplot
Understanding Decision tree, random forest, logistic regression,SVM , boosting ,bagging models ,cart ,and neural network
Defining the evaluation metrics
Prediction using all the features by applying Logistic Regression
Dividing the dataset into train and test dataset
Data Balancing using Smote
Applying Cross-Validation to avoid overfitting
Using "varImp" function in R to get the best features fo the model
Applying ensemble method Random Forest model
Applying bagging model Decision Tree
Applying linear model Logistic Regression
Using the confusion matrix to visualize the predictions
Selecting the final model and making predictions on the test dataset

Project Description

Fraud risk is everywhere, but for companies that advertise online, click fraud can happen at an overwhelming volume, resulting in misleading click data and wasted money. Ad channels can drive up costs by simply clicking on the ad at a large scale. With over 1 billion smart mobile devices in active use every month, China is the largest mobile market in the world and therefore suffers from huge volumes of fraudulent traffic. 

In this machine learning project, you will build a machine learning model to determine whether a click is fraud or not.

Curriculum For This Mini Project

Problem Statement
Data Set
Install Libraries
Import Data Set
Data Set Overview
Next Steps
Recap in Rstudio
Missing Data
Analyse Click Time variable
Analyse Features
Convert variables to correct Data types
Exploratory Data Analysis
Model Creation using all Features
Selecting Important Features
Tune length Parameter
Model Creation using selected Features
Split Data into Training and Testing
Data Balancing using SMOTE
Model Creation - Decision Tree
Model Creation - Random Forest