Differencing is a method of transforming a time series dataset. It can be used to remove the series dependence on time, so-called temporal dependence. This includes structures like trends and seasonality.
So this recipe is a short example on what is differencing in time series and why do we do it. Let's get started.
import numpy as np import pandas as pd import matplotlib.pyplot as plt
Let's pause and look at these imports. Numpy and pandas are general ones. Here matplotlib.pyplot will help us in plotting.
df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']).set_index('date')
Here, we have used one time series data from github. Also, we have set our index to date.
Now our dataset is ready.
def difference(dataset, interval=1): diff = list() for i in range(interval, len(dataset)): value = dataset[i] - dataset[i - interval] diff.append(value) return diff
We are just taking difference between each adjacement elements for removal of trends. Each differnence is taken and put in one list which is sent back when call on.
diff = difference(df.value) plt.plot(diff) plt.show()
Here we have simply calling our defined function. Finally, we are trying to understand the trend in dataset.
Once we run the above code snippet, we will see:
Scroll down the ipython file to visualize the output.
Clearly, it can be seen that the trend has been removed from our dataset.