How to evaluate timeseries models using AIC?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to evaluate timeseries models using AIC?

How to evaluate timeseries models using AIC?

This recipe helps you evaluate timeseries models using AIC

0

Recipe Objective

The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies the goodness of fit and the simplicity/parsimony, of the model into a single statistic. When comparing two models, the one with the lower AIC is generally 'better'.

So this recipe is a short example on how to evaluate time series models using AIC. Let's get started.

Step 1 - Import the library

import numpy as np import pandas as pd from statsmodels.tsa.arima_model import ARIMA

Let's pause and look at these imports. Numpy and pandas are general ones. Here matplotlib.pyplot will help us in plotting. statsmodels.tsa.arima_model will help us in model building.

Step 2 - Setup the Data

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])

Here, we have used one time series data from github. Also, we have set our index to date.

Now our dataset is ready.

Step 3 - Calculating AIC

for i in range(0,2): for j in range(0,2): for k in range(0,2): model = ARIMA(df.value, order=(i, j, k)).fit() print(model.aic)

Best AIC can easily be calcuated through libraries. Here we have tried to understand what actually is happening inside. With variation of values of orders, AIC can be seen varying.

Step 4 - Lets look at our dataset now

Once we run the above code snippet, we will see:

1310.0276476996216
1152.1010884622729
906.8908037492013
858.8861982732806
908.9724818749953
874.8436348634339
879.5863881212866
843.8379425029493

Clearly, order (1,1,1) is best fitted solution to our model. It can be extended further to 2 degrees to have a better understanding of results.

Relevant Projects

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.