Share @ LinkedIn Facebook  time-series, dates, times, timezones
Time-Series - Dates, Times & Time Zone Handling in Python using Pandas

Dates, Times & Time Zone Handling in Python using Pandas

Table of Contents

Introduction

Time-series data is quite common among many datasets related to fields like finance, geography, earthquakes, healthcare, etc. Properly interpreting time-series data and handling requires good knowledge of generating properly formatted datetime related columns. Pandas provides a list of tools that helps us convert data to proper datetime format, generate a new range of datetime and many other datetime manipulation functions. We'll be exploring further about how to properly handle datetime data.

1. Date Ranges

Pandas provides very helpful function date_range() which lets us generate a range of fixed frequency dates. It takes arguments like start, end, periods, and freq to generate a range of dates though all of the parameters are not compulsory. We'll explain how to generate various date ranges below with different frequencies with various examples.

In [1]:
import pandas as pd
import numpy as np

We can create a list of date ranges by setting start, periods and freq parameters or start, end and freq parameters. If we don't provide freq parameter value then the default value is D which refers to 1 day. It returns a list of dates as DatetimeIndex series. We'll create date ranges by setting various strings of date formats to check which formats work with pandas date_range() function.

In [2]:
pd.date_range(start="2020 Jan 1", periods=5)
Out[2]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
In [3]:
pd.date_range(start="2020 January 1", periods=5)
Out[3]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
In [4]:
pd.date_range(start="1 Jan 2020", periods=5)
Out[4]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
In [5]:
pd.date_range(start="Jan 1, 2020", periods=5)
Out[5]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
In [6]:
pd.date_range(start="2020-7-1", periods=5)
Out[6]:
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')
In [7]:
pd.date_range(start="2020/7/1", periods=5)
Out[7]:
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')

We can see that all of the above examples generated 5 days from the start date given. We can see that pandas can handle various date formats as well.

In [8]:
pd.date_range(start="1-7-2020", periods=5)
Out[8]:
DatetimeIndex(['2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
               '2020-01-11'],
              dtype='datetime64[ns]', freq='D')
In [9]:
pd.date_range(start="7-1-2020", periods=5)
Out[9]:
DatetimeIndex(['2020-07-01', '2020-07-02', '2020-07-03', '2020-07-04',
               '2020-07-05'],
              dtype='datetime64[ns]', freq='D')

We can see from the above two examples that the first one did not generate results as we expected. The reason behind this is that if you provide year last then pandas assume that first value is month then day.

Below we have given a few more examples where we generate date ranges by setting start, end and freq parameters. Pandas uses D for a day, H for an hour, S for seconds, T/min for minutes, B for business days, M for month-end, MS for month start and ms/L for milliseconds.

In [10]:
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D")
Out[10]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05'],
              dtype='datetime64[ns]', freq='D')
In [11]:
pd.date_range(start="1-1-2020", end="1-10-2020", freq="B")
Out[11]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
               '2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10'],
              dtype='datetime64[ns]', freq='B')
In [12]:
pd.date_range(start="1-1-2020 00:00", end="1-1-2020 5:00", freq="H")
Out[12]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',
               '2020-01-01 02:00:00', '2020-01-01 03:00:00',
               '2020-01-01 04:00:00', '2020-01-01 05:00:00'],
              dtype='datetime64[ns]', freq='H')
In [13]:
pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="30S")
Out[13]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:00:30',
               '2020-01-01 00:01:00', '2020-01-01 00:01:30',
               '2020-01-01 00:02:00', '2020-01-01 00:02:30',
               '2020-01-01 00:03:00', '2020-01-01 00:03:30',
               '2020-01-01 00:04:00', '2020-01-01 00:04:30',
               '2020-01-01 00:05:00'],
              dtype='datetime64[ns]', freq='30S')
In [14]:
print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="T"))
print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="2min"))
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:01:00',
               '2020-01-01 00:02:00', '2020-01-01 00:03:00',
               '2020-01-01 00:04:00', '2020-01-01 00:05:00'],
              dtype='datetime64[ns]', freq='T')
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 00:02:00',
               '2020-01-01 00:04:00'],
              dtype='datetime64[ns]', freq='2T')
In [15]:
pd.date_range(start="1-1-2020", periods=5, freq="M")
Out[15]:
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31'],
              dtype='datetime64[ns]', freq='M')
In [16]:
pd.date_range(start="1-1-2020", periods=5, freq="MS")
Out[16]:
DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01',
               '2020-05-01'],
              dtype='datetime64[ns]', freq='MS')
In [17]:
print(pd.date_range(start="1-1-2020", periods=5, freq="100ms"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:00.500000", freq="100L"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:01", freq="100L"))
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000'],
              dtype='datetime64[ns]', freq='100L')
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000', '2020-01-01 00:00:00.500000'],
              dtype='datetime64[ns]', freq='100L')
DatetimeIndex([       '2020-01-01 00:00:00', '2020-01-01 00:00:00.100000',
               '2020-01-01 00:00:00.200000', '2020-01-01 00:00:00.300000',
               '2020-01-01 00:00:00.400000', '2020-01-01 00:00:00.500000',
               '2020-01-01 00:00:00.600000', '2020-01-01 00:00:00.700000',
               '2020-01-01 00:00:00.800000', '2020-01-01 00:00:00.900000',
                      '2020-01-01 00:00:01'],
              dtype='datetime64[ns]', freq='100L')
In [18]:
pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H")
Out[18]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-02 04:00:00',
               '2020-01-03 08:00:00', '2020-01-04 12:00:00',
               '2020-01-05 16:00:00', '2020-01-06 20:00:00',
               '2020-01-08 00:00:00', '2020-01-09 04:00:00'],
              dtype='datetime64[ns]', freq='28H')
In [19]:
pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H30S")
Out[19]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-02 04:00:30',
               '2020-01-03 08:01:00', '2020-01-04 12:01:30',
               '2020-01-05 16:02:00', '2020-01-06 20:02:30',
               '2020-01-08 00:03:00', '2020-01-09 04:03:30'],
              dtype='datetime64[ns]', freq='100830S')

1.1 Date Range Filtering

We can also subscript series based on various combinations as well. We can pass various date time formats to filter a list of ranges. We'll explain it below with few examples.

In [20]:
rng = pd.date_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser
Out[20]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [21]:
ser["1/1/2020":"1/4/2020"]
Out[21]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
Freq: D, dtype: int64
In [22]:
ser["1/5/2020":]
Out[22]:
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [23]:
from datetime import datetime

ser[datetime(2020,1,3):]
Out[23]:
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

We can also pass partial indexing like only year and month or only year and it'll filter all values matching those combinations. We have explained it as well below with few examples.

In [24]:
ser["2020-1":]
Out[24]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [25]:
ser["2020":]
Out[25]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

2. Timestamp

Timestamp function lets us create an object of a particular point in time. We'll need it to represent value which changes with different time stamps. We can create a timestamp by setting various date formats as explained by the below examples. It lets us pass values till nanoseconds.

In [26]:
pd.Timestamp("Jan 2020")
Out[26]:
Timestamp('2020-01-01 00:00:00')
In [27]:
pd.Timestamp("12 Jan 2020")
Out[27]:
Timestamp('2020-01-12 00:00:00')
In [28]:
pd.Timestamp("12 Jan 2020 20:20")
Out[28]:
Timestamp('2020-01-12 20:20:00')
In [29]:
pd.Timestamp("12 Jan 2020 20:20:20.100")
Out[29]:
Timestamp('2020-01-12 20:20:20.100000')
In [30]:
pd.Timestamp("12 Jan 2020 20:20:20.200000")
Out[30]:
Timestamp('2020-01-12 20:20:20.200000')
In [31]:
pd.Timestamp("12 Jan 2020 20:20:20.000200000")
Out[31]:
Timestamp('2020-01-12 20:20:20.000200')

We can create timestamps by setting year, month, day, hour, minute, second, microsecond and nanosecond separately as well. We'll explain it below with few examples.

In [32]:
pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100)
Out[32]:
Timestamp('2020-01-01 10:10:30.000100')
In [33]:
pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100, nanosecond=100)
Out[33]:
Timestamp('2020-01-01 10:10:30.000100100')

We can add one timestamp to another timestamp as well as subtract one timestamp to another timestamp to move values by that much amount of time. We have explained it below with few examples to make the concept clear. The output of time stamp addition and subtraction is time delta which we have explained in the next section.

In [34]:
t1 = pd.Timestamp("12 Jan 2020 12:12:45")
t2 = pd.Timestamp("12 Jan 2020 13:14:20")

(t2 -t1), (t1 - t2)
Out[34]:
(Timedelta('0 days 01:01:35'), Timedelta('-1 days +22:58:25'))
In [35]:
t1 = pd.Timestamp("12 Jan 2020")
t2 = pd.Timestamp("13 Jan 2020")

(t2 - t1),
Out[35]:
(Timedelta('1 days 00:00:00'),)

3. Timedelta

Timedelta function lets us create a difference between the two timestamps. We might need this function to analyze how far 2 date/time values are from each other. We'll explain below with a few examples of how to create time deltas using pandas. We can create time deltas consisting of days, days with hour:min:seconds: nanoseconds. If we don't provide value for a particular part then it's default value will be assumed.

In [36]:
pd.Timedelta("1 days")
Out[36]:
Timedelta('1 days 00:00:00')
In [37]:
pd.Timedelta("1 days 10:00:00")
Out[37]:
Timedelta('1 days 10:00:00')
In [38]:
pd.Timedelta("1 days 10:10:00")
Out[38]:
Timedelta('1 days 10:10:00')
In [39]:
pd.Timedelta("1 days 10:10:10")
Out[39]:
Timedelta('1 days 10:10:10')
In [40]:
pd.Timedelta("1 days 10:10:10.100000")
Out[40]:
Timedelta('1 days 10:10:10.100000')

We can perform addition and subtraction functions on time deltas to get combined time delta and time delta difference respectively. We'll explain it with various examples.

In [41]:
t1 = pd.Timedelta("1 days 10:10:10.100000")
t2 = pd.Timedelta("2 days 10:10:10.100000")

t2 - t1
Out[41]:
Timedelta('1 days 00:00:00')
In [42]:
t1 + t2
Out[42]:
Timedelta('3 days 20:20:20.200000')

Time deltas as very useful when you want to move your timestamps by a particular time delta. We can add and subtract time deltas from timestamp to get dates moved. We have explained it below with few examples.

In [43]:
pd.Timestamp("12 Jan 2020") + pd.Timedelta("1 days")
Out[43]:
Timestamp('2020-01-13 00:00:00')
In [44]:
pd.Timestamp("Jan 2020") + pd.Timedelta("1 days")
Out[44]:
Timestamp('2020-01-02 00:00:00')
In [45]:
pd.Timestamp("12 Jan 2020") - pd.Timedelta("1 days")
Out[45]:
Timestamp('2020-01-11 00:00:00')
In [46]:
pd.Timestamp("12 Jan 2020") + pd.Timedelta("4H")
Out[46]:
Timestamp('2020-01-12 04:00:00')
In [47]:
pd.Timestamp("12 Jan 2020") + pd.Timedelta("30min")
Out[47]:
Timestamp('2020-01-12 00:30:00')
In [48]:
pd.Timestamp("12 Jan 2020") + pd.Timedelta("30 seconds")
Out[48]:
Timestamp('2020-01-12 00:00:30')

We can add and subtract time deltas from date ranges as well and it'll move all values of date ranges by that much time delta. We'll explain it below with few examples.

In [49]:
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("1 days")
Out[49]:
DatetimeIndex(['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05',
               '2020-01-06'],
              dtype='datetime64[ns]', freq='D')
In [50]:
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") - pd.Timedelta("1 days")
Out[50]:
DatetimeIndex(['2019-12-31', '2020-01-01', '2020-01-02', '2020-01-03',
               '2020-01-04'],
              dtype='datetime64[ns]', freq='D')
In [51]:
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("4H30T30S")
Out[51]:
DatetimeIndex(['2020-01-01 04:30:30', '2020-01-02 04:30:30',
               '2020-01-03 04:30:30', '2020-01-04 04:30:30',
               '2020-01-05 04:30:30'],
              dtype='datetime64[ns]', freq='D')

4. Period (TimeSpan)

Pandas provides a Period function to represent the time span. We'll need periods when we want to represent values that are the same throughout the period and not changing much. Period function lets us pass freq like Timestamp function and if we don't pass it then it'll detect it from date format passed. We'll explain the creation of various periods by various examples below.

In [52]:
pd.Period(value="1-1-2020")
Out[52]:
Period('2020-01-01', 'D')
In [53]:
pd.Period("1-2020")
Out[53]:
Period('2020-01', 'M')
In [54]:
pd.Period("2020")
Out[54]:
Period('2020', 'A-DEC')
In [55]:
pd.Period("1-1-2020 10:00:00")
Out[55]:
Period('2020-01-01 10:00:00', 'S')
In [56]:
pd.Period("1-1-2020 10:00")
Out[56]:
Period('2020-01-01 10:00', 'T')
In [57]:
pd.Period("1-1-2020 10")
Out[57]:
Period('2020-01-01 10:00', 'H')
In [58]:
pd.Period("1-1-2020 10:00:00.100000")
Out[58]:
Period('2020-01-01 10:00:00.100', 'L')

We can add and subtract time deltas from periods as well. We have explained it below with few examples.

In [59]:
pd.Period("1-1-2020 10:10") + pd.Timedelta("1 days")
Out[59]:
Period('2020-01-02 10:10', 'T')
In [60]:
pd.Period("1-1-2020 10:10:10") - pd.Timedelta("1 days")
Out[60]:
Period('2019-12-31 10:10:10', 'S')
In [61]:
pd.Period("1-1-2020 10") + pd.Timedelta("10H")
Out[61]:
Period('2020-01-01 20:00', 'H')
In [62]:
pd.Period("1-1-2020 10") - pd.Timedelta("10H")
Out[62]:
Period('2020-01-01 00:00', 'H')

5. Period Ranges

Just like date_range() function, period_range() function lets us generate list of periods. This function has almost all parameters the same as that date_range() function. We have explained various ways to create the period range below.

In [63]:
pd.period_range(start="1-1-2020", periods=5)
Out[63]:
PeriodIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
             '2020-01-05'],
            dtype='period[D]', freq='D')
In [64]:
pd.period_range(start="1-1-2020", periods=5, freq="4H")
Out[64]:
PeriodIndex(['2020-01-01 00:00', '2020-01-01 04:00', '2020-01-01 08:00',
             '2020-01-01 12:00', '2020-01-01 16:00'],
            dtype='period[4H]', freq='4H')
In [65]:
pd.period_range(start="1-1-2020", periods=5, freq="M")
Out[65]:
PeriodIndex(['2020-01', '2020-02', '2020-03', '2020-04', '2020-05'], dtype='period[M]', freq='M')
In [66]:
pd.period_range(start="1-1-2020", periods=5, freq="T")
Out[66]:
PeriodIndex(['2020-01-01 00:00', '2020-01-01 00:01', '2020-01-01 00:02',
             '2020-01-01 00:03', '2020-01-01 00:04'],
            dtype='period[T]', freq='T')
In [67]:
pd.period_range(start="1-1-2020", periods=6, freq="12min")
Out[67]:
PeriodIndex(['2020-01-01 00:00', '2020-01-01 00:12', '2020-01-01 00:24',
             '2020-01-01 00:36', '2020-01-01 00:48', '2020-01-01 01:00'],
            dtype='period[12T]', freq='12T')
In [68]:
pd.period_range(start="1-1-2020", periods=4, freq="5S")
Out[68]:
PeriodIndex(['2020-01-01 00:00:00', '2020-01-01 00:00:05',
             '2020-01-01 00:00:10', '2020-01-01 00:00:15'],
            dtype='period[5S]', freq='5S')

We can add and subtract time delta from a list of periods the same way we did with date ranges. It'll move the list of periods by that much time delta amount. We have explained it below with few examples.

In [69]:
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1 days")
Out[69]:
PeriodIndex(['2020-01-02 00:00:00', '2020-01-02 00:00:05',
             '2020-01-02 00:00:10', '2020-01-02 00:00:15'],
            dtype='period[5S]', freq='5S')
In [70]:
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("5 seconds")
Out[70]:
PeriodIndex(['2020-01-01 00:00:05', '2020-01-01 00:00:10',
             '2020-01-01 00:00:15', '2020-01-01 00:00:20'],
            dtype='period[5S]', freq='5S')
In [71]:
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1min")
Out[71]:
PeriodIndex(['2020-01-01 00:01:00', '2020-01-01 00:01:05',
             '2020-01-01 00:01:10', '2020-01-01 00:01:15'],
            dtype='period[5S]', freq='5S')

5.1 Period Range Filtering

We can also subscript series based on various combinations as well. We can pass various date time formats to filter the list of ranges. We'll explain it below with few examples.

In [72]:
rng = pd.period_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser
Out[72]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [73]:
ser["1/1/2020":"1/4/2020"]
Out[73]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
Freq: D, dtype: int64
In [74]:
ser["1/3/2020":]
Out[74]:
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [75]:
from datetime import datetime

ser[datetime(2020,1,3):]
Out[75]:
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

We can also pass partial indexing like only year and month or only year and it'll filter all values matching those combinations. We have explained it as well below with few examples.

In [76]:
ser["2020-1"]
Out[76]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64
In [77]:
ser["2020"]
Out[77]:
2020-01-01    0
2020-01-02    1
2020-01-03    2
2020-01-04    3
2020-01-05    4
2020-01-06    5
Freq: D, dtype: int64

6. TimeDelta Ranges

Pandas provides function named timedelta_range() just like date_range() and period_range() to create range of time deltas. It lets us create list of time deltas by following almost same format as date_range() and period_range(). We have explained below few examples of timedelta_range() usage.

In [78]:
pd.timedelta_range(start="1 day", periods=10)
Out[78]:
TimedeltaIndex([ '1 days',  '2 days',  '3 days',  '4 days',  '5 days',
                 '6 days',  '7 days',  '8 days',  '9 days', '10 days'],
               dtype='timedelta64[ns]', freq='D')
In [79]:
pd.timedelta_range(start="1 day", periods=10, freq="30D")
Out[79]:
TimedeltaIndex([  '1 days',  '31 days',  '61 days',  '91 days', '121 days',
                '151 days', '181 days', '211 days', '241 days', '271 days'],
               dtype='timedelta64[ns]', freq='30D')
In [80]:
pd.timedelta_range(start="1 day", periods=10, freq="10H")
Out[80]:
TimedeltaIndex(['1 days 00:00:00', '1 days 10:00:00', '1 days 20:00:00',
                '2 days 06:00:00', '2 days 16:00:00', '3 days 02:00:00',
                '3 days 12:00:00', '3 days 22:00:00', '4 days 08:00:00',
                '4 days 18:00:00'],
               dtype='timedelta64[ns]', freq='10H')
In [81]:
pd.timedelta_range(start="1 day", end="2 day", freq="4H")
Out[81]:
TimedeltaIndex(['1 days 00:00:00', '1 days 04:00:00', '1 days 08:00:00',
                '1 days 12:00:00', '1 days 16:00:00', '1 days 20:00:00',
                '2 days 00:00:00'],
               dtype='timedelta64[ns]', freq='4H')
In [82]:
pd.timedelta_range(start="1 hour", end="2 hour", freq="10min")
Out[82]:
TimedeltaIndex(['01:00:00', '01:10:00', '01:20:00', '01:30:00', '01:40:00',
                '01:50:00', '02:00:00'],
               dtype='timedelta64[ns]', freq='10T')
In [83]:
pd.timedelta_range(start="1 min", end="5 min", freq="T")
Out[83]:
TimedeltaIndex(['00:01:00', '00:02:00', '00:03:00', '00:04:00', '00:05:00'], dtype='timedelta64[ns]', freq='T')
In [84]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("1 days")
Out[84]:
TimedeltaIndex([ '2 days',  '3 days',  '4 days',  '5 days',  '6 days',
                 '7 days',  '8 days',  '9 days', '10 days', '11 days'],
               dtype='timedelta64[ns]', freq='D')

We can move time delta ranges by adding or subtracting time delta from it. We have explained it below with few examples.

In [85]:
pd.timedelta_range(start="1 day", periods=10) - pd.Timedelta("1 days")
Out[85]:
TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days', '5 days',
                '6 days', '7 days', '8 days', '9 days'],
               dtype='timedelta64[ns]', freq='D')
In [86]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2 days")
Out[86]:
TimedeltaIndex([ '3 days',  '4 days',  '5 days',  '6 days',  '7 days',
                 '8 days',  '9 days', '10 days', '11 days', '12 days'],
               dtype='timedelta64[ns]', freq='D')
In [87]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H")
Out[87]:
TimedeltaIndex([ '3 days 05:00:00',  '4 days 05:00:00',  '5 days 05:00:00',
                 '6 days 05:00:00',  '7 days 05:00:00',  '8 days 05:00:00',
                 '9 days 05:00:00', '10 days 05:00:00', '11 days 05:00:00',
                '12 days 05:00:00'],
               dtype='timedelta64[ns]', freq='D')
In [88]:
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H30min")
Out[88]:
TimedeltaIndex([ '3 days 05:30:00',  '4 days 05:30:00',  '5 days 05:30:00',
                 '6 days 05:30:00',  '7 days 05:30:00',  '8 days 05:30:00',
                 '9 days 05:30:00', '10 days 05:30:00', '11 days 05:30:00',
                '12 days 05:30:00'],
               dtype='timedelta64[ns]', freq='D')

7. TimeZone

Pandas let us specify timezone when creating date ranges, timestamps, etc. Till now, we have created all date ranges and timestamps without any time stamp set. We'll now explore ways to set timezones and conversion from one timezone to another timezone. We can specify timezone by setting string value to argument tz of date_range() and Timestamp(). Python library pytz maintains a list of all available time zone names.

In [89]:
from pytz import common_timezones, all_timezones

print("Number of Common Timezones : ", len(common_timezones))
print("Number of All Timezones : ", len(all_timezones))
print("Difference between all timezones and common timezones : ", list(set(all_timezones) - set(common_timezones)))
Number of Common Timezones :  440
Number of All Timezones :  592
Difference between all timezones and common timezones :  ['Etc/UCT', 'Asia/Rangoon', 'Atlantic/Faeroe', 'Asia/Ujung_Pandang', 'Etc/GMT-8', 'WET', 'GB', 'Navajo', 'Etc/UTC', 'Asia/Katmandu', 'Canada/Saskatchewan', 'EET', 'Mexico/General', 'America/Montreal', 'UCT', 'Australia/Canberra', 'GMT+0', 'Etc/GMT-3', 'Etc/GMT+8', 'HST', 'Pacific/Ponape', 'Australia/ACT', 'Pacific/Truk', 'Chile/Continental', 'Portugal', 'America/Knox_IN', 'Etc/GMT-13', 'Etc/GMT-10', 'Atlantic/Jan_Mayen', 'US/Samoa', 'Pacific/Yap', 'Brazil/East', 'Kwajalein', 'Asia/Thimbu', 'Asia/Tel_Aviv', 'America/Shiprock', 'Etc/GMT-5', 'Etc/GMT-12', 'America/Jujuy', 'Etc/GMT+10', 'Europe/Belfast', 'Australia/South', 'Etc/GMT-9', 'Australia/Yancowinna', 'Eire', 'CET', 'Mexico/BajaNorte', 'Etc/GMT+0', 'Universal', 'GMT0', 'Hongkong', 'Etc/GMT0', 'EST5EDT', 'Turkey', 'EST', 'America/Argentina/ComodRivadavia', 'Etc/GMT-1', 'Etc/Zulu', 'Jamaica', 'Asia/Saigon', 'Etc/GMT-6', 'Australia/West', 'Etc/GMT+9', 'Australia/Queensland', 'PRC', 'NZ-CHAT', 'America/Louisville', 'US/Aleutian', 'MET', 'Japan', 'NZ', 'America/Santa_Isabel', 'Etc/GMT+1', 'Etc/GMT+4', 'Poland', 'Israel', 'Etc/GMT+5', 'America/Ensenada', 'Australia/NSW', 'Antarctica/South_Pole', 'America/Porto_Acre', 'Egypt', 'Brazil/DeNoronha', 'Chile/EasterIsland', 'Etc/GMT+6', 'Africa/Asmera', 'MST', 'Etc/GMT-2', 'America/Indianapolis', 'CST6CDT', 'Asia/Chongqing', 'GMT-0', 'Cuba', 'Etc/GMT+7', 'Singapore', 'US/Michigan', 'Brazil/Acre', 'Asia/Kashgar', 'Europe/Nicosia', 'W-SU', 'Etc/GMT-14', 'Iran', 'Etc/GMT+3', 'Etc/GMT', 'Etc/GMT+11', 'Pacific/Samoa', 'Australia/North', 'Africa/Timbuktu', 'Zulu', 'Asia/Macao', 'Asia/Chungking', 'Asia/Ulan_Bator', 'MST7MDT', 'US/Indiana-Starke', 'America/Catamarca', 'America/Buenos_Aires', 'Etc/GMT-7', 'Brazil/West', 'Europe/Tiraspol', 'Mexico/BajaSur', 'Etc/GMT-4', 'Canada/Yukon', 'US/East-Indiana', 'ROC', 'America/Rosario', 'Pacific/Johnston', 'America/Fort_Wayne', 'Asia/Calcutta', 'Etc/GMT+2', 'America/Coral_Harbour', 'Australia/Victoria', 'Libya', 'Asia/Istanbul', 'Asia/Harbin', 'Asia/Ashkhabad', 'GB-Eire', 'Australia/LHI', 'Greenwich', 'Etc/GMT+12', 'Australia/Tasmania', 'America/Virgin', 'PST8PDT', 'America/Mendoza', 'ROK', 'Etc/GMT-0', 'America/Atka', 'Iceland', 'Asia/Dacca', 'Etc/GMT-11', 'America/Cordoba', 'Etc/Universal', 'Etc/Greenwich']
In [90]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
rng.tz
In [91]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M", tz="US/Eastern")
rng.tz
Out[91]:
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
In [92]:
ts = pd.Timestamp("1-1-2020")
print(ts)
ts.tz
2020-01-01 00:00:00
In [93]:
ts = pd.Timestamp("1-1-2020", tz="Asia/Calcutta")
print(ts)
ts.tz
2020-01-01 00:00:00+05:30
Out[93]:
<DstTzInfo 'Asia/Calcutta' IST+5:30:00 STD>

7.1 tz_localize()

Pandas provides method tz_localize() to set a timezone for date ranges, timestamp which does not have any timezone set previously. It returns modified date range, timestamp with timestamp passed to tz_localize() as set.

In [94]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
               '2020-05-31'],
              dtype='datetime64[ns]', freq='M')
Timezone :  None
In [95]:
rng = rng.tz_localize("US/Eastern")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2020-01-31 00:00:00-05:00', '2020-02-29 00:00:00-05:00',
               '2020-03-31 00:00:00-04:00', '2020-04-30 00:00:00-04:00',
               '2020-05-31 00:00:00-04:00'],
              dtype='datetime64[ns, US/Eastern]', freq='M')
Timezone :  US/Eastern
In [96]:
ts = pd.Timestamp("1-1-2020")
print(ts)
print("Timezone : ", ts.tz)
2020-01-01 00:00:00
Timezone :  None
In [97]:
ts = ts.tz_localize("US/Central")
print(ts)
print("Timezone : ", ts.tz)
2020-01-01 00:00:00-06:00
Timezone :  US/Central

7.2 tz_convert()

We can convert date ranges and timestamp from one timezone to another timezone using tz_convert() method. We can pass a new timezone to tz_convert() method and it'll return modified date range and timestamp with time modified according to new timezone. We'll explain its usage with a few examples below.

In [98]:
ts = pd.Timestamp("1-1-2020", tz="US/Central")
print(ts)
print("Timezone : ", ts.tz)
2020-01-01 00:00:00-06:00
Timezone :  US/Central
In [99]:
ts = ts.tz_convert("US/Eastern")
print(ts)
print("Timezone : ", ts.tz)
2020-01-01 01:00:00-05:00
Timezone :  US/Eastern
In [100]:
rng = pd.date_range(start="1-1-2020", periods=5, freq="D", tz="US/Eastern")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2020-01-01 00:00:00-05:00', '2020-01-02 00:00:00-05:00',
               '2020-01-03 00:00:00-05:00', '2020-01-04 00:00:00-05:00',
               '2020-01-05 00:00:00-05:00'],
              dtype='datetime64[ns, US/Eastern]', freq='D')
Timezone :  US/Eastern
In [101]:
rng = rng.tz_convert("US/Central")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2019-12-31 23:00:00-06:00', '2020-01-01 23:00:00-06:00',
               '2020-01-02 23:00:00-06:00', '2020-01-03 23:00:00-06:00',
               '2020-01-04 23:00:00-06:00'],
              dtype='datetime64[ns, US/Central]', freq='D')
Timezone :  US/Central
In [102]:
rng = rng.tz_convert("Asia/Calcutta")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2020-01-01 10:30:00+05:30', '2020-01-02 10:30:00+05:30',
               '2020-01-03 10:30:00+05:30', '2020-01-04 10:30:00+05:30',
               '2020-01-05 10:30:00+05:30'],
              dtype='datetime64[ns, Asia/Calcutta]', freq='D')
Timezone :  Asia/Calcutta
In [103]:
rng = rng.tz_convert("Asia/Istanbul")
print(rng)
print("Timezone : ", rng.tz)
DatetimeIndex(['2020-01-01 08:00:00+03:00', '2020-01-02 08:00:00+03:00',
               '2020-01-03 08:00:00+03:00', '2020-01-04 08:00:00+03:00',
               '2020-01-05 08:00:00+03:00'],
              dtype='datetime64[ns, Asia/Istanbul]', freq='D')
Timezone :  Asia/Istanbul

We can notice above that time has been moved when changed from one timezone to another. It takes care of daylight savings time as well. Pandas also provides list of function like to_datetime() which can be used to convert list of strings to pandas date time formatted list. It also accepts format which you can use to specify date-time format if it fails to recognize exact time format by itself.

In [104]:
pd.to_datetime(["1-1-2020","1-2-2020", "1-3-2020"])
Out[104]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)
In [105]:
pd.to_datetime(["1 Jan 2020","2 Jan 2020", "3 Jan 2020"], )
Out[105]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)
In [106]:
pd.to_datetime(["2020/1/1","2020/1/2", "2020/1/3"])
Out[106]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

Below examples explain when you will want to use format attribute of to_datetime() method when it's not creating proper date time.

In [107]:
pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"])
Out[107]:
DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01'], dtype='datetime64[ns]', freq=None)
In [108]:
pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"], format="%Y/%d/%m")
Out[108]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03'], dtype='datetime64[ns]', freq=None)

This ends our small tutorial on various dates, times and periods creation functionalities available with pandas. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki