Resampling time series generally refers to:

- Enforcing frequency to data when you have data measured without any kind of frequency (e.g. data collected with different time delta between various measurements).
- Enforcing different frequencies than the already present frequency of measured data.

We need methods that can help us enforce some kind of frequency to data so that it makes analysis easy. Python library `Pandas`

is quite commonly used to hold time series data and it provides a list of tools to handle sampling of data. We'll be exploring ways to resample time series data using pandas.

In [1]:

```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
```

Resampling is generally performed in two ways:

**Up Sampling:**It happens when you convert time series from lower frequency to higher frequency like from month-based to day-based or hour-based to minute-based. When time series is data is converted from lower frequency to higher frequency then a number of observations increases hence we need a method to fill newly created frequency. We'll explain below various methods available when going through examples.**Down Sampling:**It happens when you convert time series from higher frequency to lower frequency like from week-based to month-based, hour-based to day-based, etc. When you convert time series from higher frequency to lower frequency then the number of samples will decrease and also it'll result in loss of some values. We'll explain it below when going through examples.

`asfreq()`

¶The first method that we'll like to introduce is `asfreq()`

method for resampling. Pandas series, as well as dataframe objects, has this method available which we can call on them.

`asfreq()`

method accepts important parameters like `freq`

, `method`

, and `fill_value`

.

`freq`

parameter lets us specify a new frequency for time series object.`method`

parameter provides a list of methods like`ffill`

,`bfill`

,`backfill`

and`pad`

for filling in newly created indexes when we up-sampled time series data. Forward fill will fill newly created indexes with values in previous indexes whereas backward fill will fill the newly created indexes with values from the next index value.`pad`

method will fill in with the same values for a particular time interval. The default value for method parameter is None and it puts NaNs in newly created indexes when upsampling.`fill_value`

lets us fill in NaNs with a value specified as this parameter. It does not fill existing NaNs in data but only NaNs which are generated by`asfreq()`

when upsampling/downsampling data.

We'll explore the usage of `asfreq()`

below with few examples.

In [2]:

```
rng = pd.date_range(start = "1-1-2020", periods=5, freq="H")
ts = pd.Series(data=range(5), index=rng)
ts
```

Out[2]:

Below we are trying a few examples to demonstrate upsampling. We'll explore various methods to fill in newly created indexes.

In [3]:

```
ts.asfreq(freq="30min")
```

Out[3]:

We can notice from the above example that `asfreq()`

method by default put NaN in all newly created indexes. We can either pass a value to be filled in into this newly created indexes by setting `fill_value`

parameter or we can call any fill method as well. We'll explain it below with few examples.

In [4]:

```
ts.asfreq(freq="30min", fill_value=0.0)
```

Out[4]:

We can see that the above example filled in all NaNs with 0.0.

In [5]:

```
ts.asfreq(freq="30min", method="ffill")
```

Out[5]:

We can notice from the above examples that `ffill`

method filled in a newly created index with the value of previous indexes.

In [6]:

```
ts.asfreq(freq="45min", method="ffill")
```

Out[6]:

In [7]:

```
ts.asfreq(freq="45min", method="bfill")
```

Out[7]:

In [8]:

```
ts.asfreq(freq="45min", method="pad")
```

Out[8]:

In [9]:

```
df = pd.DataFrame({"TimeSeries":ts})
df
```

Out[9]:

In [10]:

```
df.asfreq(freq="45min")
```

Out[10]:

In [11]:

```
df.asfreq(freq="45min", fill_value=0.0)
```

Out[11]:

In [12]:

```
df.asfreq("30min", method="ffill")
```

Out[12]:

In [13]:

```
df.asfreq("30min", method="bfill")
```

Out[13]:

In [14]:

```
df.asfreq("30min", method="pad")
```

Out[14]:

We'll now explain a few examples of downsampling.

In [15]:

```
ts.asfreq(freq="1H30min")
```

Out[15]:

In [16]:

```
ts.asfreq(freq="1H30min", fill_value=0.0)
```

Out[16]:

In [17]:

```
ts.asfreq(freq="1H30min", method="ffill")
```

Out[17]:

In [18]:

```
ts.asfreq(freq="1H30min", method="bfill")
```

Out[18]:

In [19]:

```
ts.asfreq(freq="1H30min", method="pad")
```

Out[19]:

In [20]:

```
df.asfreq(freq="1H30min")
```

Out[20]:

In [21]:

```
df.asfreq(freq="1H30min", fill_value=0.0)
```

Out[21]:

In [22]:

```
df.asfreq(freq="1H30min", method="ffill")
```

Out[22]:

In [23]:

```
df.asfreq(freq="1H30min", method="bfill")
```

Out[23]:

In [24]:

```
df.asfreq(freq="1H30min", method="pad")
```

Out[24]:

We can lose data sometimes when doing downsampling and the `asfreq()`

method just uses a simple approach of downsampling. It provides only method bfill, ffill, and pad for filling in data when upsampling or downsampling. What if we need to apply some other function than these three functions. We need a more reliable approach to handle downsampling. Pandas provides another method called `resample()`

which can help us with that.

`resample()`

¶`resample()`

method accepts new frequency to be applied to time series data and returns Resampler object. We can apply various methods other than `bfill`

, `ffill`

and `pad`

for filling in data when doing upsampling/downsampling. The Resampler object supports a list of aggregation functions like mean, std, var, count, etc which will be applied to time-series data when doing upsampling or downsampling. We'll explain the usage of `resample()`

below with few examples.

We are below trying various ways to downsample the data below.

In [25]:

```
ts.resample("1H30min").mean()
```

Out[25]:

The above example is taking mean of index values appearing in that 1 hour and 30-minute windows. Out time series is sampled at 1 hour so in 1 hour and 30 minutes window generally, 2 values will fall in. It'll take mean of that values when downsampling to the new index. We can call functions other than `mean()`

like `std()`

, `var()`

, `sum()`

, `count()`

,`interpolate()`

etc.

In [26]:

```
ts.resample("1H15min").mean()
```

Out[26]:

In [27]:

```
ts.resample("1H15min").std()
```

Out[27]:

In [28]:

```
ts.resample("1H15min").var()
```

Out[28]:

In [29]:

```
ts.resample("1H15min").sum()
```

Out[29]:

In [30]:

```
ts.resample("1H15min").count()
```

Out[30]:

In [31]:

```
ts.resample("1H15min").bfill()
```

Out[31]:

In [32]:

```
ts.resample("1H15min").ffill()
```

Out[32]:

We'll now try below a few examples by upsampling time series.

Please make a note that we can even apply our own defined function to Resampler object by passing it to **apply()** method on it.

In [33]:

```
ts.resample("45min").bfill()
```

Out[33]:

In [34]:

```
ts.resample("45min").apply(lambda x: x**2 if x.values.tolist() else np.nan)
```

Out[34]:

In [35]:

```
ts.resample("45min").interpolate()
```

Out[35]:

In [36]:

```
df.resample("45min").mean().fillna(0.0)
```

Out[36]:

The above examples clearly state that `resample()`

is a very flexible function and lets us resample time series by applying a variety of functions.

Please make a note that in order for **asfreq()** and **resample()** to work time series data should be sorted according to time else it won't work. It's also suggested to use resample() more frequently than asfreq() because of flexibility of it.

Moving window functions refers to functions that can be applied to time-series data by moving fixed/variable size window over total data and computing descriptive statistics over window data each time. Here window generally refers to a number of samples taken from total time series in order and represents a particular represents a period of time.

There are 2 kinds of window functions:

**Rolling Window Functions:**It performs aggregate operations on the window with the same amount of sample each time.**Expanding Window Functions:**It performs aggregate operations on the window which expands with time.

Pandas provides a list of functions for performing window functions. We'll start with `rolling()`

function.

`rolling()`

¶`rolling()`

function lets us perform rolling window functions on time series data. `rolling()`

function can be called on both series and dataframe in pandas. It accepts window size as a parameter to group values by that window size and returns `Rolling`

objects which have grouped values according to window size. We can then apply various aggregate functions on this object as per our needs. We'll create a simple dataframe of random data to explain this further.

In [37]:

```
df = pd.DataFrame(np.random.randn(100, 4),
index = pd.date_range('1/1/2020', periods = 100),
columns = ['A', 'B', 'C', 'D'])
df.head()
```

Out[37]:

In [ ]:

```
df.plot(figsize=(8,4));
```

In [39]:

```
r = df.rolling(3)
r
```

Out[39]:

Above, We have created a rolling object with a window size of 3. We can now apply various aggregate functions on this object to get a modified time series. We'll start by applying a mean function to a rolling object and then visualize data of column B from the original dataframe and rolled output.

In [ ]:

```
df["B"].plot(color="grey", figsize=(8,4));
r.mean()["B"].plot(color="red");
```

There are many other descriptive statistics functions available which can be applied to rolling object like `count()`

, `median()`

, `std()`

, `var()`

, `quantile()`

, `skew()`

, etc. We can try a few below for our learning purpose.

In [ ]:

```
df["B"].plot(color="grey", figsize=(8,4));
r.quantile(0.25)["B"].plot(color="red");
```

In [ ]:

```
df["B"].plot(color="grey", figsize=(8,4));
r.skew()["B"].plot(color="red");
```

In [ ]:

```
df["B"].plot(color="grey", figsize=(8,4));
r.var()["B"].plot(color="red");
```

We can even apply our own function by passing it to `apply()`

function. We are explaining its usage below with an example.

Please make a note that input to function passed to **apply()** will be numpy array of samples same as window size.

In [ ]:

```
df["B"].plot(color="grey", figsize=(8,4));
r.apply(lambda x: x.sum())["B"].plot(color="red");
```

We can apply more than one aggregate function by passing them to `agg()`

function. We'll explain it below with an example. We can apply aggregate functions to only one column as well as ignoring other columns.

In [45]:

```
r.agg(["mean", "std"]).head()
```

Out[45]:

In [46]:

```
r["A"].agg(["mean", "std"]).head()
```

Out[46]:

We can perform a rolling window function on data samples at a different frequency than the original frequency as well. We'll below load data as hourly and then apply rolling window function by daily sampling that data.

In [47]:

```
df = pd.DataFrame(np.random.randn(100, 4),
index = pd.date_range('1/1/2020', freq="H", periods = 100),
columns = ['A', 'B', 'C', 'D'])
df.head()
```

Out[47]:

In [48]:

```
df.resample("1D").mean().rolling(3).mean().head()
```

Out[48]:

In [ ]:

```
df.resample("1D").mean().rolling(3).mean().plot();
```

We can notice above that our output is with daily frequency than the hourly frequency of original data.

`expanding()`

¶Pandas provided a function named `expanding()`

to perform expanding window functions on our time series data. `expanding()`

function can be called on both series and dataframe in pandas. As we discussed above, expanding window functions are applied to total data and takes into consideration all previous values, unlike the rolling window which takes fixed-size samples into consideration. We'll explain it's usage below with few examples.

In [50]:

```
df.expanding(min_periods=1).mean().head()
```

Out[50]:

In [ ]:

```
df.expanding(min_periods=1).mean().plot();
```

We can notice from the above plot that the output of expanding the window is fluctuating at the beginning but then settling as more samples come into the computation. The output fluctuates bit initially due to less number of samples taken into consideration initially. The number of samples increases as we move forward with computation and keeps on increasing till the whole time-series has completed.

We can apply various aggregation function to expanding window like `count()`

, `median()`

, `std()`

, `var()`

, `quantile()`

, `skew()`

, etc. We'll explain them below with few examples.

In [ ]:

```
df.expanding(min_periods=1).std().plot();
```

In [ ]:

```
df.expanding(min_periods=1).var().plot();
```

We can apply more than one aggregation function by passing their names as a list to `agg()`

function as well as we can apply our own function by passing it to `apply()`

function. We have explained both usage below with examples.

In [54]:

```
df.expanding(min_periods=1).agg(["mean", "var"]).head()
```

Out[54]:

In [ ]:

```
df.expanding(min_periods=1).apply(lambda x: x.sum()).plot();
```

In [ ]:

```
df["A"].expanding(min_periods=1).apply(lambda x: x.sum()).plot();
```

We'll generally use `expanding()`

windows function when we care about all past samples in time series data even though new samples are added to it. We'll use it when we want to take all previous samples into consideration. We'll use `rolling()`

window functions when only the past few samples are important and all samples before it can be ignored.

`ewm()`

¶An exponential weighted moving average is weighted moving average of last n samples from time-series data. `ewm()`

function can be called on both series and dataframe in pandas. The exponential weighted moving average function assigns weights to each previous samples which decreases with each previous sample. We'll explain its usage by comparing it with `rolling()`

window function.

In [ ]:

```
df["A"].ewm(span=10).mean().plot(color="tab:red");
df["A"].rolling(window=10).mean().plot(color="tab:green");
```

In [ ]:

```
df["A"].ewm(span=10, min_periods=5).mean().plot(color="tab:red");
df["A"].rolling(window=10).mean().plot(color="tab:green");
```

We can apply a different kinds of aggregation functions like we applied above with `rolling()`

and `expanding()`

functions. We'll try below a few examples for explanation purposes.

In [ ]:

```
df.ewm(span=10).std().plot();
```

In [60]:

```
df.ewm(span=10).agg(["mean", "var"]).head()
```

Out[60]:

This concludes our small tutorial on resampling and moving window functions with time-series data using pandas. Please feel free to let us know your views in the comments section below.

Sunny Solanki

cufflinks [Python] - How to create plotly charts from pandas dataframe with one line of code?

Simple 2D Animation in Python using bqplot & ipywidgets

How to put the chart into the tooltip of another chart in bqplot [Python]?

Plotting Maps using Bokeh [Python]