MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to do cross validation for time series?

# How to do cross validation for time series?

This recipe helps you do cross validation for time series

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

```
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_model import ARMA
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
```

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

```
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df.head()
```

Here, we have used one time series data from github.

Now our dataset is ready.

```
tscv = TimeSeriesSplit(n_splits = 4)
rmse = []
for train_index, test_index in tscv.split(df):
cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]
model = ARMA(cv_train.value, order=(0, 1)).fit()
predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1])
true_values = cv_test.value
rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))
```

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

```
print(np.mean(rmse))
```

Here, we have printed the coeffiecient of model and the predicted values.

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

Use the Zillow dataset to follow a test-driven approach and build a regression machine learning model to predict the price of the house based on other variables.

In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

In this ML Project, you will use the Avocado dataset to build a machine learning model to predict the average price of avocado which is continuous in nature based on region and varieties of avocado.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.