How does K fold cross validation work in R?

How does K fold cross validation work in R
Last Updated: 13 Sep 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

The major challenge when building a model is to make it work accurately on unseen data. Cross-Validation is one of the techniques which can be used to check the effectiveness of the model. It reserves a portion of the data which is not used while training the model. It is rather used later as unseen data to test/validate the model by giving us the prediction error.

German Credit Card Dataset Analysis

The three most common Cross-Validation Techniques are:

Leave one out cross-validation (LOOC)
K-fold cross-validation
repeated k-fold cross validation.

In this recipe, we will learn how to use perform K-fold Cross Validation while building a linear regression model R.

K-fold cross validation technique splits the dataset into 'k' folds or subsets. In this technique, the following steps takes place:

Randomly split the data into k folds or subsets
Train the model on the whole dataset except leaving out one subset.
Testing the model against that one left out subset.
The above three steps are repeated until the every subset is used to train and test the model.
The final prediction error obtained is the average of the errors in every case.

The k value of 5 or 10 is typically chosen as it tends to avoid excessively high bias and variance simulataneously. This technique is a full proof technique which does not leave out important data when building the model.

Recipe Objective

STEP 1: Importing Necessary Libraries

library(caret) library(tidyverse) # for data manipulation

STEP 2: Read a csv file and explore the data

The dataset attached contains the data of 160 different bags associated with ABC industries.

The bags have certain attributes which are described below:

Height – The height of the bag
Width – The width of the bag
Length – The length of the bag
Weight – The weight the bag can carry
Weight1 – Weight the bag can carry after expansion

The company now wants to predict the cost they should set for a new variant of these kinds of bags.

data <- read.csv("R_303_Data_1.csv") glimpse(data)

Rows: 159
Columns: 6
$ Cost     242, 290, 340, 363, 430, 450, 500, 390, 450, 500, 475, 500,...
$ Weight   23.2, 24.0, 23.9, 26.3, 26.5, 26.8, 26.8, 27.6, 27.6, 28.5,...
$ Weight1  25.4, 26.3, 26.5, 29.0, 29.0, 29.7, 29.7, 30.0, 30.0, 30.7,...
$ Length   30.0, 31.2, 31.1, 33.5, 34.0, 34.7, 34.5, 35.0, 35.1, 36.2,...
$ Height   11.5200, 12.4800, 12.3778, 12.7300, 12.4440, 13.6024, 14.17...
$ Width    4.0200, 4.3056, 4.6961, 4.4555, 5.1340, 4.9274, 5.2785, 4.6...

summary(data) # returns the statistical summary of the data columns

Cost            Weight         Weight1          Length     
 Min.   :   0.0   Min.   : 7.50   Min.   : 8.40   Min.   : 8.80  
 1st Qu.: 120.0   1st Qu.:19.05   1st Qu.:21.00   1st Qu.:23.15  
 Median : 273.0   Median :25.20   Median :27.30   Median :29.40  
 Mean   : 398.3   Mean   :26.25   Mean   :28.42   Mean   :31.23  
 3rd Qu.: 650.0   3rd Qu.:32.70   3rd Qu.:35.50   3rd Qu.:39.65  
 Max.   :1650.0   Max.   :59.00   Max.   :63.40   Max.   :68.00  
     Height           Width      
 Min.   : 1.728   Min.   :1.048  
 1st Qu.: 5.945   1st Qu.:3.386  
 Median : 7.786   Median :4.248  
 Mean   : 8.971   Mean   :4.417  
 3rd Qu.:12.366   3rd Qu.:5.585  
 Max.   :18.957   Max.   :8.142

dim(data)

159 6

STEP 3: Performing K-Fold Cross validation

We will use caret package to perform Cross Validation. Firstly, we will use the trainControl() function to define the method of cross validation to be carried out and then use train() function.

Syntax: train(formula, data = , method = , trControl = , tuneGrid = )

where:

formula = y~x1+x2+x3+..., where y is the independent variable and x1,x2,x3 are the dependent variables
data = dataframe
method = Type of the model to be built
trControl = Takes the control parameters. We will use trainControl function out here where we will specify the Cross validation technique.

# specifying the CV technique which will be passed into the train() function later train_control = trainControl(method = "cv", number = 5) # training a linear regression model with LOOCV model = train(Cost~., data = data_1, method = "lm", trControl = train_control) # summarising the results print(model)

Linear Regression 

159 samples
  5 predictor

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 128, 128, 127, 126, 127 
Resampling results:

  RMSE      Rsquared   MAE     
  123.8312  0.8897412  93.41378

Tuning parameter 'intercept' was held constant at a value of TRUE

Note: The averaged RMSE, R-squared and MAE mentioned above is the cross validation error.

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Customer Churn Prediction Analysis using Ensemble Techniques

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

View Project Details

Build a Text Generator Model using Amazon SageMaker

In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

View Project Details

AWS MLOps Project for ARCH and GARCH Time Series Models

Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .

View Project Details

MLOps Project to Deploy Resume Parser Model on Paperspace

In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

View Project Details

Linear Regression Model Project in Python for Beginners Part 1

Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How does K fold cross validation work in R?

Recipe Objective

Table of Contents

STEP 1: Importing Necessary Libraries

STEP 2: Read a csv file and explore the data

STEP 3: Performing K-Fold Cross validation

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects