How to find optimal paramters for ARIMA model?

How to find optimal paramters for ARIMA model?

How to find optimal paramters for ARIMA model?

This recipe helps you find optimal paramters for ARIMA model


Recipe Objective

The ARIMA model for time series analysis and forecasting can be tricky to configure. We can automate the process of evaluating a large number of hyperparameters for the ARIMA model by using a grid search procedure.

So this recipe is a short example on how to find optimal paramters for ARIMA model. Let's get started.

Step 1 - Import the library

import warnings import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARIMA from sklearn.metrics import mean_squared_error

Let's pause and look at these imports. Numpy, pandas and warnings are general ones. Here, statsmodels.tsa.arima_model will help in building our model. mean_squared_error will be used for calculating MSE score.

Step 2 - Setup the Data

df = pd.read_csv('', parse_dates=['date']).set_index('date')

Here, we have used one time series data from github. Also, we have set our index to date.

Now our dataset is ready.

Step 3 - Splitting Dataset

train_data = df[1:len(df)-12] test_data = df[len(df)-12:]

Here, we have simply broken our dataset to two parts as test and train.

Step 4 - GridSearch

p_values = [0, 1] d_values = range(0, 2) q_values = range(0, 2)

Here, we have defined p,d and q for hyperparameter testing.

Step 5 - Looping for testing

for p in p_values: for d in d_values: for q in q_values: order = (p,d,q) warnings.filterwarnings("ignore") model = ARIMA(train_data.value, order=order).fit() predictions = model.predict(start=len(train_data), end=len(train_data) + len(test_data)-1) error = mean_squared_error(test_data, predictions) print('ARIMA%s MSE=%.3f' % (order,error))

With each loop, we choose one parameter, fit the model and calculate the MSE over predictions. Later we choose the best model by looking at lowest MSE score.

Step 6 - Lets look at our dataset now

Once we run the above code snippet, we will see:

Srcoll down the ipython file to visualize the results.

Best model to choose is (1,0,1).

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.