Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction

In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.


What will you learn

Detailed business description and the problem being addressed through analytics
Data uploading using popular pandas python package
Dataset overview and how to analyze a sample of the dataset
Exploratory data analysis to understand the Allstate insurance claim dataset
Analyzing 5 point summary and studying data distribution for categorical variables
Handling missing values for categorical and continuous variables
Outlier treatment with visual techniques (Box-Plots)
Difference between Label/One-Hot-Encoder and which technique to use
Use of Pickle file format to store and load models
Feature selection and elimination using Correlation, Constant Variance and Chi-Square statistical tests
Understanding ensemble Machine Learning algorithms
Hyper-parameter tuning using Sklearn functions
Model selection using RMSE as the model evaluation metric
Model deployment creating FlaskAPI

Project Description

All State, a personal insurance company in the United States, is interested in leveraging data science to predict the severity and the cost of insurance claims post an unforeseen event.

This ensemble machine learning project will help you understand the best practices followed in approaching a data analytics problem through python language focusing on using data science packages. We will predict how severe insurance claims will be for All State. We accomplish this using ensemble machine learning algorithms.

Curriculum For This Mini Project

Business Problem Overview
Dataset Overview
Exploratory Data Analysis
Data Cleaning Pre-processing
Handling Outliers
Dependent Variable Analysis - Introduction To Ml Algorithms
Feature Selection - Continuous Variables
Feature Selection - 2
Variable Encoding - One Hot Technique
Categorical Feature Selection - Chi Square Test
Building A Machine Learning Model - Random Forest - Hyper Parameter Tuning
Model Validation - GBM (Gradient Boosting Machine) Model
Model Prediction On Test Data
Model Deployment - API