HANDS-ON-LAB

End-to-End Churn Prediction using Airflow and Docker Project

Problem Statement

Build an end-to-end churn prediction system which reads the data from Postgresql and scores the data for churn score using an ML model. All this is scheduled using an Airflow and docker.

Dataset

Kindly download the data from here.

Tasks

Create a database named “Bank” in the postgre db and upload the data as a table.
Using python sqlalchemy library pull the data from the database and save it as a csv file.
EDA:

- Check for missing values and univariate distributions of all the variables.
- Apply chi2_test for checking the importance of categorical variables with the target.
- Apply test for checking the importance of numerical variables with the target.

Build a Binary Classification model on the preprocessed and feature engineered data. Select the best model which has a high roc_auc score.
Create scripts for Data Sanity, Data drift, concept drift and model drift in the similar fashion as explained in the project video.
Use Docker and Airflow to create the pipeline to trigger a slack message when there are more than 10 customers who have more than 90% churn score (probability score from the model).

Unlock the potential of Docker and Airflow to create a seamless pipeline for churn prediction and real-time alerts.

FAQs

Q1. How can I assess the importance of categorical variables in predicting churn?

Apply chi2_test to evaluate the statistical significance of categorical variables with the target.

Q2. How can I assess the importance of numerical variables in predicting churn?

Apply test to analyze the statistical significance of numerical variables in relation to the target.

Q3. How can I trigger a slack message for high churn probability customers?

Utilize Docker and Airflow to create a pipeline that automatically sends a slack message for customers with a churn score above 90%.