Explain stratified K fold cross validation in ML in python

This recipe explains stratified K fold cross validation in ML in python
Last Updated: 21 Dec 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Stratified K fold cross-validation object is a variation of KFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class. It provides train/test indices to split data in train/test sets.

So this recipe is a short example on what is stratified K fold cross validation . Let's get started.

Recipe Objective

Step 1 - Import the library

from sklearn import datasets from sklearn.datasets import load_breast_cancer from sklearn.linear_model import LogisticRegression from sklearn.model_selection import StratifiedKFold from statistics import mean

Let's pause and look at these imports. Here sklearn.dataset is used to import one classification based model dataset. Also, we have exported LogisticRegression to build the model. Now StratifiedKFold will help us in performing Stratified K fold cross-validation.

Step 2 - Setup the Data

X,y=load_breast_cancer(return_X_y=True)

Here, we have used load_breast_cancer function to import our dataset in two list form (X and y) and therefore kept return_X_y to be True.

Now our dataset is ready.

Step 3 - Building the model and Cross Validation model

model = LogisticRegression() skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1) lst_accu_stratified = []

We have simply built a regressor model with LogisticRegression with default values. Now for StratifiedKFold, we have kept n_splits to be 10, dividing our dataset for 10 times. Also, the shuffling is kept to be True.

Step 4 - Building Stratified K fold cross validation

for train_index, test_index in skf.split(X, y): X_train_fold, X_test_fold = X[train_index], X[test_index] y_train_fold, y_test_fold = y[train_index], y[test_index] model.fit(X_train_fold, y_train_fold) lst_accu_stratified.append(model.score(X_test_fold, y_test_fold))

skf.split has divided our model into 10 random index set. We have then fit our model at each set and thereby calculated accuracy score.

Step 5 - Printing the results

print('Maximum Accuracy',max(lst_accu_stratified)) print('Minimum Accuracy:',min(lst_accu_stratified)) print('Overall Accuracy:',mean(lst_accu_stratified))

Here we have maximum accuracy, minimum accuracy and average accuracy across 10 fold validation set.

Step 6 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Maximum Accuracy 1.0
Minimum Accuracy: 0.9137931034482759
Overall Accuracy: 0.9579185031544378

Clearly, the model performace in quite high in any case across 10 fold stratified cross validation.

What Users are saying..

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

MLOps using Azure Devops to Deploy a Classification Model

In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

View Project Details

Deep Learning Project for Time Series Forecasting in Python

Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

View Project Details

End-to-End Speech Emotion Recognition Project using ANN

Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

View Project Details

Learn Hyperparameter Tuning for Neural Networks with PyTorch

In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

View Project Details

Build a Autoregressive and Moving Average Time Series Model

In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

View Project Details

Deep Learning Project for Text Detection in Images using Python

CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.

View Project Details

Locality Sensitive Hashing Python Code for Look-Alike Modelling

In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.

View Project Details

Build ARCH and GARCH Models in Time Series using Python

In this Project we will build an ARCH and a GARCH model using Python

View Project Details

Build Customer Propensity to Purchase Model in Python

In this machine learning project, you will learn to build a machine learning model to estimate customer propensity to purchase.

View Project Details

A/B Testing Approach for Comparing Performance of ML Models

The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

View Project Details

Explain stratified K fold cross validation in ML in python

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Building the model and Cross Validation model

Step 4 - Building Stratified K fold cross validation

Step 5 - Printing the results

Step 6 - Lets look at our dataset now

Anand Kumpatla

Relevant Projects

You might also like

Relevant Projects