How to do cross validation for time series?

This recipe helps you do cross validation for time series
Last Updated: 22 Mar 2023

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

Get Access to Time Series Analysis Real World Projects in Python

Recipe Objective

Step 1 - Import the library

import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARMA from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import mean_squared_error

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

Step 2 - Setup the Data

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) df.head()

Here, we have used one time series data from github.

Now our dataset is ready.

Step 3 - Splitting Data

tscv = TimeSeriesSplit(n_splits = 4) rmse = [] for train_index, test_index in tscv.split(df): cv_train, cv_test = df.iloc[train_index], df.iloc[test_index] model = ARMA(cv_train.value, order=(0, 1)).fit() predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1]) true_values = cv_test.value rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

Step 4 - Printing the results

print(np.mean(rmse))

Here, we have printed the coeffiecient of model and the predicted values.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Build a Autoregressive and Moving Average Time Series Model

In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

View Project Details

BigMart Sales Prediction ML Project in Python

The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

View Project Details

Customer Churn Prediction Analysis using Ensemble Techniques

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

View Project Details

Time Series Forecasting with LSTM Neural Network Python

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

View Project Details

LLM Project to Build and Fine Tune a Large Language Model

In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

View Project Details

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask

Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

View Project Details

Linear Regression Model Project in Python for Beginners Part 1

Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

Recommender System Machine Learning Project for Beginners-3

Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

View Project Details

Personalized Medicine: Redefining Cancer Treatment

In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

View Project Details

How to do cross validation for time series?

Recipe Objective

Table of Contents

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Splitting Data

Step 4 - Printing the results

Step 5 - Lets look at our dataset now

Ray han

Relevant Projects

You might also like

Relevant Projects