Updated On : Apr-16,2020  time-series, trend, seasonality, pandas # Time Series - How to Remove Trend & Seasonality from Time-Series Data using Pandas [Python]¶

## Introduction ¶

Both Trends and Seasonality are generally present in the majority of time series data of the real world. When we want to do the forecasting with time-series, we need a stationary time series. The stationary time series are data where there is no trend or seasonality information present in it. The stationary time series is a series with constant mean, constant variance, and constant autocorrelation. We need to find a way to remove trends and seasonality from our time series so that we can use it with prediction models. To do that, we need to understand what is trends and seasonality in-depth to handle it better. Apart from trend and seasonality, some time-series also has noise/error/residual component present as well. We can decompose time-series to see different components. We'll now try to explain the presence of trends and seasonality in this tutorial and how to handle it further with examples.

## 1. Types of Time-Series ¶

Time-series are of generally two types:

• Additive Time-Series: Additive time-series is time-series where components (trend, seasonality, noise) are added to generate time series.
• `Time-Series = trend + seasonality + noise`
• Multiplicative Time-Series: Multiplicative time-series is time-series where components (trend, seasonality, noise) are multiplied to generate time series. one can notice an increase in the amplitude of seasonality in multiplicative time-series.
• `Time-Series = trend * seasonality * noise`

## 2. Trend ¶

The trends represent an increase or decrease in time-series value over time. If we notice that the value of measurement over time is increasing or decreasing then we can say that it has an upward or downward trend.

### How to remove trend from time-series data?¶

There are various ways to de-trend a time series. We have explained a few below.

• Log Transformation.
• Power Transformation.
• local smoothing - Applying moving window functions to time-series data.
• Differencing a time-series.
• Linear Regression.

## 3. Seasonality ¶

The seasonality represents variations in measured value which repeats over the same time interval regularly. If we notice that particular variations in value are happening every week, month, quarter or half-yearly then we can say that time series has some kind of seasonality.

### How to remove seasonality from time-series data?¶

There are various ways to remove seasonality. The task of removing seasonality is a bit complicated. We have explained a few ways below to remove seasonality.

• Average de-trended values.
• Differencing a time-series.
• Use the loess method.

## 4. Dicky-Fuller Test for Stationarity ¶

Once we can remove trend and seasonality from time-series data then we can test its stationarity using a `dicky-fuller test`. It’s a statistical test to check the stationarity of time-series data.

We'll now explore trend and seasonality removal with examples. We'll be using famous air passenger datasets available on-line for our purpose because it has both trend and seasonality. It has information about US airline passengers from 1949 to 1960 recorded each month. Please download the dataset to follow along.

In :
```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline
```
In :
```air_passengers = pd.read_csv("datasets/AirPassengers.csv", index_col=0, parse_dates=True)
```
Out:
#Passengers
Month
1949-01-01 112
1949-02-01 118
1949-03-01 132
1949-04-01 129
1949-05-01 121
In [ ]:
```air_passengers.plot(figsize=(8,4), color="tab:red");
```

In [ ]:
```air_passengers["1952"].plot(kind="bar", color="tab:green", legend=False);
```

By looking at the above plots we can see that our time-series is multiplicative time-series and has both trend as well as seasonality. We can see the trend as passengers are constantly increasing over time. We can see seasonality with the same variations repeating for 1 year where value peaks somewhere are around August.

## 5. Decompose Time-Series to see Individual Components ¶

We can decompose time-series to see various components of time-series. Python module named `statmodels` provides us with easy to use utility which we can use to get an individual component of time-series and then visualize it.

In :
```from statsmodels.tsa.seasonal import seasonal_decompose
```
In [ ]:
```decompose_result = seasonal_decompose(air_passengers, model="multiplicative")

trend = decompose_result.trend
seasonal = decompose_result.seasonal
residual = decompose_result.resid

decompose_result.plot();
```

We can notice trend and seasonality components separately as well as residual components. There is a loss of residual in the beginning which is settling later.

## 6. Checking Whether Time-Series is Stationary or Not ¶

As we declared above time-series is stationary whose mean, variance and auto-covariance are independent of time. We can check mean, variance and auto-covariance using moving window functions available with pandas. We'll also use a dicky-fuller test available with statsmodels to check the stationarity of time-series. If time-series is not stationary then we need to make it stationary.

Below we have taken an average over moving window of 12 samples. We noticed from the above plots that there is the seasonality of 12 months in time-series. We can try different window sizes for testing purposes.

In [ ]:
```air_passengers.rolling(window = 12).mean().plot(figsize=(8,4), color="tab:red", title="Rolling Mean over 12 month period");
```

In [ ]:
```air_passengers.rolling(window = 20).mean().plot(figsize=(8,4), color="tab:red", title="Rolling mean over 20 month period");
```

We can clearly see that time-series has a visible upward trend.

Below we have taken variance over the moving window of 12 samples. We noticed from the above plots that there is the seasonality of 12 months in time-series.

In [ ]:
```air_passengers.rolling(window = 12).var().plot(figsize=(8,4), color="tab:red", title="Rolling Variance over 12 month period");
```

In [ ]:
```air_passengers.rolling(window = 20).var().plot(figsize=(8,4), color="tab:red", title="Rolling variance over 20 month period");
```

From the above two plots, we can notice that time-series has some-kind of multiplicative effect which seems to be increasing with time-period. We can see that the low seasonality effect in the beginning which is amplifying over time.

Below we are also plotting auto-correlation plot for time-series data as well. This plot helps us understand whether present values of time-series are positively correlated, negatively correlated or not related at all with past values. `statsmodels` library provides ready to use method `plot_acf` as a part of module `statsmodels.graphics.tsaplots`.

In [ ]:
```from statsmodels.graphics.tsaplots import plot_acf

plot_acf(air_passengers);
```

We can notice from the above chart that after 13 lags, the line gets inside confidence interval (light blue area). This can be due to seasonality of 12-13 months in our data.

### Testing with Dicky-Fuller¶

We can perform Dicky-Fuller test functionality available with the statsmodels library. Below we'll test the stationarity of our time-series with this functionality and try to interpret its results to better understand it.

In :
```from statsmodels.tsa.stattools import adfuller

dftest = adfuller(air_passengers['#Passengers'], autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  0.8153688792060544
2. P-Value :  0.9918802434376411
3. Num Of Lags :  13
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 130
5. Critical Values :
1% :  -3.4816817173418295
5% :  -2.8840418343195267
10% :  -2.578770059171598
```

We can interpret above results based on p-values of result.

• `p-value > 0.05` - This implies that time-series is `non-stationary`.
• `p-value <=0.05` - This implies that time-series is `stationary`.

We can see from the above results that `p-value` is greater than 0.05 hence our time-series is not stationary. It still has time-dependent components present which we need to remove.

## 7. Remove Trend ¶

There are various ways to remove trends from data as we have discussed above. We'll try ways like differencing, power transformation, log transformation, etc.

### Logged Transformation¶

To apply log transformation, we need to take a log of each individual value of time-series data.

In [ ]:
```logged_passengers = air_passengers["#Passengers"].apply(lambda x : np.log(x))

ax1 = plt.subplot(121)
logged_passengers.plot(figsize=(12,4) ,color="tab:red", title="Log Transformed Values", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(color="tab:red", title="Original Values", ax=ax2);
```

From the above first chart, we can see that we have reduced the variance of time-series data. We can look at y-values of original time-series data and log-transformed time-series data to conclude that the variance of time-series is reduced.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(logged_passengers)

decompose_result.plot();
```

### Power Transformations¶

We can apply power transformation in data same way as that of log transformation to remove trend.

In [ ]:
```powered_passengers = air_passengers["#Passengers"].apply(lambda x : x ** 0.5)

ax1 = plt.subplot(121)
powered_passengers.plot(figsize=(12,4), color="tab:red", title="Powered Transformed Values", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

From the above first chart, we can see that we have reduced the variance of time-series data. We can look at y-values of original time-series data and power-transformed time-series data to conclude that the variance of time-series is reduced.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(powered_passengers)

decompose_result.plot();
```

### Applying Moving Window Functions¶

We can calculate rolling mean over a period of 12 months and subtract it from original time-series to get de-trended time-series.

In [ ]:
```rolling_mean = air_passengers.rolling(window = 12).mean()
passengers_rolled_detrended = air_passengers - rolling_mean

ax1 = plt.subplot(121)
passengers_rolled_detrended.plot(figsize=(12,4),color="tab:red", title="Differenced With Rolling Mean over 12 month", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

From the above the first chart, we can see that we seem to have removed trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(passengers_rolled_detrended.dropna())

decompose_result.plot();
```

### Applying Moving Window Function on Log Transformed Time-Series¶

We can apply more than one transformation as well. We'll first apply log transformation to time-series, then take a rolling mean over a period of 12 months and then subtract rolled time-series from log-transformed time-series to get final time-series.

In [ ]:
```logged_passengers = pd.DataFrame(air_passengers["#Passengers"].apply(lambda x : np.log(x)))

rolling_mean = logged_passengers.rolling(window = 12).mean()
passengers_log_rolled_detrended = logged_passengers["#Passengers"] - rolling_mean["#Passengers"]

ax1 = plt.subplot(121)
passengers_log_rolled_detrended.plot(figsize=(12,4),color="tab:red", title="Log Transformation & Differenced With Rolling Mean over 12 month", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

From the above the first chart, we can see that we are able to removed the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(passengers_log_rolled_detrended.dropna())

decompose_result.plot();
```

### Applying Moving Window Function on Power Transformed Time-Series¶

We can apply more than one transformation as well. We'll first apply power transformation to time-series, then take a rolling mean over a period of 12 months and then subtract rolled time-series from power-transformed time-series to get final time-series.

In [ ]:
```powered_passengers = pd.DataFrame(air_passengers["#Passengers"].apply(lambda x : np.log(x)))

rolling_mean = powered_passengers.rolling(window = 12).mean()
passengers_pow_rolled_detrended = powered_passengers["#Passengers"] - rolling_mean["#Passengers"]

ax1 = plt.subplot(121)
passengers_pow_rolled_detrended.plot(figsize=(12,4),color="tab:red", title="Power Transformation & Differenced With Rolling Mean over 12 month", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

From the above the first chart, we can see that we are able to remove the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(passengers_pow_rolled_detrended.dropna())

decompose_result.plot();
```

### Applying Linear Regression to Remove Trend¶

We can also apply a linear regression model to remove the trend. Below we are fitting a linear regression model to our time-series data. We are then using a fit model to predict time-series values from beginning to end. We are then subtracting predicted values from original time-series to remove the trend.

In [ ]:
```from statsmodels.regression.linear_model import OLS
least_squares = OLS(air_passengers["#Passengers"].values, list(range(air_passengers.shape)))
result = least_squares.fit()

fit = pd.Series(result.predict(list(range(air_passengers.shape))), index = air_passengers.index)

passengers_ols_detrended = air_passengers["#Passengers"] - fit

ax1 = plt.subplot(121)
passengers_ols_detrended.plot(figsize=(12,4), color="tab:red", title="Linear Regression Fit", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

From the above the first chart, we can see that we are able to remove the trend from time-series data.

We can check whether we are successful or not by checking individual components of time-series by decomposing it as we had done above.

In [ ]:
```decompose_result = seasonal_decompose(passengers_ols_detrended.dropna())

decompose_result.plot();
```

After applying the above transformations, we can say that linear regression seems to have done a good job of removing the trend than other methods. We can confirm it further whether it actually did good by removing the seasonal component and checking stationarity of time-series.

## 8. Remove Seasonality ¶

We can remove seasonality by differencing technique. We'll use differencing over various de-trended time-series calculated above.

### Differencing Over Log Transformed Time-Series¶

We have applied differencing to log-transformed time-series by shifting its value by 1 period and subtracting it from original log-transformed time-series

In [ ]:
```logged_passengers_diff = logged_passengers - logged_passengers.shift()

ax1 = plt.subplot(121)
logged_passengers_diff.plot(figsize=(12,4), color="tab:red", title="Log-Transformed & Differenced Time-Series", ax=ax1)
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```dftest = adfuller(logged_passengers_diff.dropna()["#Passengers"].values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -2.7171305983881595
2. P-Value :  0.07112054815085424
3. Num Of Lags :  14
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 128
5. Critical Values :
1% :  -3.4825006939887997
5% :  -2.884397984161377
10% :  -2.578960197753906
```

From our dicky-fuller test results, we can confirm that time-series is `NOT STATIONARY` due to the p-value of 0.07 greater than 0.05.

### Differencing Over Power Transformed Time-Series¶

We have applied differencing to power transformed time-series by shifting its value by 1 period and subtracting it from original power transformed time-series

In [ ]:
```powered_passengers_diff = powered_passengers - powered_passengers.shift()

ax1 = plt.subplot(121)
powered_passengers_diff.plot(figsize=(12,4), color="tab:red", title="Power-Transformed & Differenced Time-Series", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```dftest = adfuller(powered_passengers_diff["#Passengers"].dropna().values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -2.7171305983881595
2. P-Value :  0.07112054815085424
3. Num Of Lags :  14
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 128
5. Critical Values :
1% :  -3.4825006939887997
5% :  -2.884397984161377
10% :  -2.578960197753906
```

From our dicky-fuller test results, we can confirm that time-series is `NOT STATIONARY` due to a p-value of 0.07 greater than 0.05.

### Differencing Over Time-Series with Rolling Mean taken over 12 Months¶

We have applied differencing to mean rolled time-series by shifting its value by 1 period and subtracting it from original mean rolled time-series

In [ ]:
```passengers_rolled_detrended_diff = passengers_rolled_detrended - passengers_rolled_detrended.shift()

ax1 = plt.subplot(121)
passengers_rolled_detrended_diff.plot(figsize=(8,4), color="tab:red", title="Rolled & Differenced Time-Series", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```dftest = adfuller(passengers_rolled_detrended_diff.dropna()["#Passengers"].values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -3.154482634863571
2. P-Value :  0.022775264967859542
3. Num Of Lags :  12
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 119
5. Critical Values :
1% :  -3.4865346059036564
5% :  -2.8861509858476264
10% :  -2.579896092790057
```

From our dicky-fuller test results, we can confirm that time-series is `STATIONARY` due to a p-value of 0.02 less than 0.05.

### Differencing Over Log Transformed & Mean Rolled Time-Series¶

We have applied differencing to log-transformed & mean rolled transformed time-series by shifting its value by 1 period and subtracting it from original time-series

In [ ]:
```passengers_log_rolled_detrended_diff = passengers_log_rolled_detrended - passengers_log_rolled_detrended.shift()

ax1 = plt.subplot(121)
passengers_log_rolled_detrended_diff.plot(figsize=(8,4), color="tab:red", title="Log-Transformed, Rolled & Differenced Time-Series", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```dftest = adfuller(passengers_log_rolled_detrended_diff.dropna().values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -3.9129812454195445
2. P-Value :  0.0019413623769362614
3. Num Of Lags :  13
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 118
5. Critical Values :
1% :  -3.4870216863700767
5% :  -2.8863625166643136
10% :  -2.580009026141913
```

From our dicky-fuller test results, we can confirm that time-series is `STATIONARY` due to a p-value of 0.001 less than 0.05.

### Differencing Over Power Transformed & Mean Rolled Time-Series¶

We have applied differencing to power transformed & mean rolled time-series by shifting its value by 1 period and subtracting it from original time-series

In [ ]:
```passengers_pow_rolled_detrended_diff = passengers_pow_rolled_detrended - passengers_pow_rolled_detrended.shift()

ax1 = plt.subplot(121)
passengers_pow_rolled_detrended_diff.plot(figsize=(8,4), color="tab:red", title="Power-Transformed, Rolled & Differenced Time-Series", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```dftest = adfuller(passengers_pow_rolled_detrended_diff.dropna().values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -3.9129812454195445
2. P-Value :  0.0019413623769362614
3. Num Of Lags :  13
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 118
5. Critical Values :
1% :  -3.4870216863700767
5% :  -2.8863625166643136
10% :  -2.580009026141913
```

From our dicky-fuller test results, we can confirm that time-series is `STATIONARY` due to a p-value of 0.001 less than 0.05.

### Differencing Over Linear Regression Transformed Time-Series¶

We have applied differencing to linear regression transformed time-series by shifting it's value by 1 period and subtracting it from original log-transformed time-series

In [ ]:
```passengers_ols_detrended_diff = passengers_ols_detrended - passengers_ols_detrended.shift()

ax1 = plt.subplot(121)
passengers_ols_detrended_diff.plot(figsize=(8,4), color="tab:red", title="Linear Regression fit & Differenced Time-Series", ax=ax1);
ax2 = plt.subplot(122)
air_passengers.plot(figsize=(12,4), color="tab:red", title="Original Values", ax=ax2);
```

We can now test whether our time-series is stationary of now by applying the dicky-fuller test which we had applied above.

In :
```from statsmodels.tsa.stattools import adfuller

dftest = adfuller(passengers_ols_detrended_diff.dropna().values, autolag = 'AIC')

print("2. P-Value : ", dftest)
print("3. Num Of Lags : ", dftest)
print("4. Num Of Observations Used For ADF Regression and Critical Values Calculation :", dftest)
print("5. Critical Values :")
for key, val in dftest.items():
print("\t",key, ": ", val)
```
```1. ADF :  -2.8292668241700007
2. P-Value :  0.05421329028382537
3. Num Of Lags :  12
4. Num Of Observations Used For ADF Regression and Critical Values Calculation : 130
5. Critical Values :
1% :  -3.4816817173418295
5% :  -2.8840418343195267
10% :  -2.578770059171598
```

From our dicky-fuller test results, we can confirm that time-series is `NOT STATIONARY` due to the p-value of 0.054 greater than 0.05.

This ends our small tutorial on handling the trend and seasonality with time-series data and various ways to remove them. Please feel free to let us know your views in the comments section.

Sunny Solanki