MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to do cross validation for time series?

# How to do cross validation for time series?

This recipe helps you do cross validation for time series

While fitting our model, we might get lucky enough and get the best test dataset while splitting. It might even overfit or underfit our model. It is therefore suggested to perform cross validation i.e. splitting several times and there after taking mean of our accuracy.

So this recipe is a short example on how to do cross validation on time series . Let's get started.

```
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_model import ARMA
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import mean_squared_error
```

Let's pause and look at these imports. Numpy and pandas are general ones. Here statsmodels.tsa.arima_model is used to import ARMA library for building of model. TimeSeriesSplit will help us in easy and random splitting while performing cross validation.

```
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df.head()
```

Here, we have used one time series data from github.

Now our dataset is ready.

```
tscv = TimeSeriesSplit(n_splits = 4)
rmse = []
for train_index, test_index in tscv.split(df):
cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]
model = ARMA(cv_train.value, order=(0, 1)).fit()
predictions = model.predict(cv_test.index.values[0], cv_test.index.values[-1])
true_values = cv_test.value
rmse.append(np.sqrt(mean_squared_error(true_values, predictions)))
```

Firstly, we have set number of splitting to be 4. Then we have loop for our cross validation. Each time, dataset is spliited to train and test datset; model is fitted on it, prediction are made and RMSE(accuracy) is calculated for each split.

```
print(np.mean(rmse))
```

Here, we have printed the coeffiecient of model and the predicted values.

Once we run the above code snippet, we will see:

6.577393548356742

You might get different result but it will be close to given due to limited splitting.

In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

In this project, we are going to work on Deep Learning using H2O to predict Census income.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.