The ARIMA model for time series analysis and forecasting can be tricky to configure. We can automate the process of evaluating a large number of hyperparameters for the ARIMA model by using a grid search procedure.
So this recipe is a short example on how to find optimal paramters for ARIMA model. Let's get started.
import warnings import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error
Let's pause and look at these imports. Numpy, pandas and warnings are general ones. Here, statsmodels.tsa.arima_model will help in building our model. mean_squared_error will be used for calculating MSE score.
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']).set_index('date')
Here, we have used one time series data from github. Also, we have set our index to date.
Now our dataset is ready.
train_data = df[1:len(df)-12] test_data = df[len(df)-12:]
Here, we have simply broken our dataset to two parts as test and train.
p_values = [0, 1] d_values = range(0, 2) q_values = range(0, 2)
Here, we have defined p,d and q for hyperparameter testing.
for p in p_values: for d in d_values: for q in q_values: order = (p,d,q) warnings.filterwarnings("ignore") model = ARIMA(train_data.value, order=order).fit() predictions = model.predict(start=len(train_data), end=len(train_data) + len(test_data)-1) error = mean_squared_error(test_data, predictions) print('ARIMA%s MSE=%.3f' % (order,error))
With each loop, we choose one parameter, fit the model and calculate the MSE over predictions. Later we choose the best model by looking at lowest MSE score.
Once we run the above code snippet, we will see:
Srcoll down the ipython file to visualize the results.
Best model to choose is (1,0,1).