How to do cross validation for time series?

This recipe helps you do cross validation for time series

Recipe Objective

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

Get Access to Time Series Analysis Real World Projects in Python

Step 1 - Import the library

import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARMA from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import mean_squared_error

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

Step 2 - Setup the Data

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) df.head()

Here, we have used one time series data from github.

Now our dataset is ready.

Step 3 - Splitting Data

tscv = TimeSeriesSplit(n_splits = 4) rmse = [] for train_index, test_index in tscv.split(df): cv_train, cv_test = df.iloc[train_index], df.iloc[test_index] model = ARMA(cv_train.value, order=(0, 1)).fit() predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1]) true_values = cv_test.value rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

Step 4 - Printing the results

print(np.mean(rmse))

Here, we have printed the coeffiecient of model and the predicted values.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Build a Autoregressive and Moving Average Time Series Model
In this time series project, you will learn to build Autoregressive and Moving Average Time Series Models to forecast future readings, optimize performance, and harness the power of predictive analytics for sensor data.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

LLM Project to Build and Fine Tune a Large Language Model
In this LLM project for beginners, you will learn to build a knowledge-grounded chatbot using LLM's and learn how to fine tune it.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Linear Regression Model Project in Python for Beginners Part 1
Machine Learning Linear Regression Project in Python to build a simple linear regression model and master the fundamentals of regression for beginners.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Recommender System Machine Learning Project for Beginners-3
Content Based Recommender System Project - Building a Content-Based Product Recommender App with Streamlit

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.