How to do cross validation for time series?

This recipe helps you do cross validation for time series

Recipe Objective

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

Get Access to Time Series Analysis Real World Projects in Python

Step 1 - Import the library

import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARMA from sklearn.model_selection import TimeSeriesSplit from sklearn.metrics import mean_squared_error

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

Step 2 - Setup the Data

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) df.head()

Here, we have used one time series data from github.

Now our dataset is ready.

Step 3 - Splitting Data

tscv = TimeSeriesSplit(n_splits = 4) rmse = [] for train_index, test_index in tscv.split(df): cv_train, cv_test = df.iloc[train_index], df.iloc[test_index] model = ARMA(cv_train.value, order=(0, 1)).fit() predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1]) true_values = cv_test.value rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

Step 4 - Printing the results

print(np.mean(rmse))

Here, we have printed the coeffiecient of model and the predicted values.

Step 5 - Lets look at our dataset now

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

FEAST Feature Store Example for Scaling Machine Learning
FEAST Feature Store Example- Learn to use FEAST Feature Store to manage, store, and discover features for customer churn prediction machine learning project.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Build a Multi-Class Classification Model in Python on Saturn Cloud
In this machine learning classification project, you will build a multi-class classification model in Python on Saturn Cloud to predict the license status of a business.

Text Classification with Transformers-RoBERTa and XLNet Model
In this machine learning project, you will learn how to load, fine tune and evaluate various transformer models for text classification tasks.

Build an optimal End-to-End MLOps Pipeline and Deploy on GCP
Learn how to build and deploy an end-to-end optimal MLOps Pipeline for Loan Eligibility Prediction Model in Python on GCP

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.