Many real-world datasets related to fields like finance, geography, earthquakes, healthcare, etc are time series data.
Properly interpreting and handling time-series data requires good knowledge of generating properly formatted datetime related columns.
Python library Pandas provides a list of tools that helps us convert data to proper datetime format, generate a new range of datetime, and many other datetime manipulation functions.
As a part of this article, we have explained various Pandas functions to work with dates, timestamps, time deltas, periods, and time zones. We have explained how to create date ranges, timestamps, time deltas, time delta ranges, periods, and period ranges with simple examples. We even explained how to modify them by adding / subtracting time deltas from them. Apart from this, we have covered in detail how to add and modify time zone of time series data.
Below, we have listed important sections of tutorial to give an overview of the material covered.
Pandas provide very helpful function date_range() which lets us generate a range of fixed frequency dates. It takes arguments like start, end, periods, and freq to generate a range of dates though all of the parameters are not compulsory. We'll explain how to generate various date ranges below with different frequencies with various examples.
import pandas as pd
import numpy as np
We can create a list of date ranges by setting start, periods and freq parameters or start, end and freq parameters. If we don't provide freq parameter value then the default value is D which refers to 1 day. It returns a list of dates as DatetimeIndex series. We'll create date ranges by setting various strings of date formats to check which formats work with pandas date_range() function.
pd.date_range(start="2020 Jan 1", periods=5)
pd.date_range(start="2020 January 1", periods=5)
pd.date_range(start="1 Jan 2020", periods=5)
pd.date_range(start="Jan 1, 2020", periods=5)
pd.date_range(start="2020-7-1", periods=5)
pd.date_range(start="2020/7/1", periods=5)
We can see that all of the above examples generated 5 days from the start date given. We can see that pandas can handle various date formats as well.
pd.date_range(start="1-7-2020", periods=5)
pd.date_range(start="7-1-2020", periods=5)
We can see from the above two examples that the first one did not generate results we expected. The reason behind this is that if you provide year last then pandas assume that first value is month then day.
Please make a note that if you provide year last in date format then pandas would assume that first value is a month and then second day because it follows US style of date format. Please keep an eye on this when generating date ranges.
Below we have given a few more examples where we generate date ranges by setting start, end, and freq parameters. Pandas uses D for a day, H for an hour, S for seconds, T/min for minutes, B for business days, M for month-end, MS for month start and ms/L for milliseconds.
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D")
pd.date_range(start="1-1-2020", end="1-10-2020", freq="B")
pd.date_range(start="1-1-2020 00:00", end="1-1-2020 5:00", freq="H")
pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="30S")
print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="T"))
print(pd.date_range(start="1-1-2020 00:00", end="1-1-2020 00:05", freq="2min"))
pd.date_range(start="1-1-2020", periods=5, freq="M")
pd.date_range(start="1-1-2020", periods=5, freq="MS")
print(pd.date_range(start="1-1-2020", periods=5, freq="100ms"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:00.500000", freq="100L"))
print(pd.date_range(start="1-1-2020", end="1-1-2020 00:00:01", freq="100L"))
Please make a note that we can mix more than one frequency types as well to create complicated frequencies as explained by below examples.
pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H")
pd.date_range(start="1-1-2020", end="1-10-2020", freq="1D4H30S")
We can also subscript series based on various combinations as well. We can pass various date time formats to filter a list of ranges. We'll explain it below with few examples.
rng = pd.date_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser
ser["1/1/2020":"1/4/2020"]
ser["1/5/2020":]
from datetime import datetime
ser[datetime(2020,1,3):]
We can also pass partial indexing like only year and month or only year and it'll filter all values matching those combinations. We have explained it as well below with few examples.
ser["2020-1":]
ser["2020":]
Pandas also provide list of functions like to_datetime() which can be used to convert list of strings to pandas date time formatted list. It also accepts format which you can use to specify date-time format if it fails to recognize exact time format by itself.
pd.to_datetime(["1-1-2020","1-2-2020", "1-3-2020"])
pd.to_datetime(["1 Jan 2020","2 Jan 2020", "3 Jan 2020"], )
pd.to_datetime(["2020/1/1","2020/1/2", "2020/1/3"])
Below examples explain when you will want to use format attribute of to_datetime() method when it's not creating proper date time.
pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"])
pd.to_datetime(["2020/1/1","2020/2/1", "2020/3/1"], format="%Y/%d/%m")
Timestamp function lets us create an object of a particular point in time. We'll need it to represent a value that changes with different time stamps. We can create a timestamp by setting various date formats as explained by the below examples. It lets us pass values till nanoseconds.
pd.Timestamp("Jan 2020")
pd.Timestamp("12 Jan 2020")
pd.Timestamp("12 Jan 2020 20:20")
pd.Timestamp("12 Jan 2020 20:20:20.100")
pd.Timestamp("12 Jan 2020 20:20:20.200000")
pd.Timestamp("12 Jan 2020 20:20:20.000200000")
We can create timestamps by setting year, month, day, hour, minute, second, microsecond and nanosecond separately as well. We'll explain it below with few examples.
pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100)
pd.Timestamp(year=2020, month=1, day=1, hour=10, minute=10, second=30, microsecond=100, nanosecond=100)
We can add one timestamp to another timestamp as well as subtract one timestamp to another timestamp to move values by that much amount of time. We have explained it below with few examples to make the concept clear. The output of time stamp addition and subtraction is time delta which we have explained in the next section.
t1 = pd.Timestamp("12 Jan 2020 12:12:45")
t2 = pd.Timestamp("12 Jan 2020 13:14:20")
(t2 -t1), (t1 - t2)
t1 = pd.Timestamp("12 Jan 2020")
t2 = pd.Timestamp("13 Jan 2020")
(t2 - t1),
Timedelta function lets us create a difference between the two timestamps. We might need this function to analyze how far 2 date/time values are from each other. We'll explain below with a few examples of how to create time deltas using pandas. We can create time deltas consisting of days and days with hour:min:seconds: nanoseconds. If we don't provide value for a particular part then its default value will be assumed.
pd.Timedelta("1 days")
pd.Timedelta("1 days 10:00:00")
pd.Timedelta("1 days 10:10:00")
pd.Timedelta("1 days 10:10:10")
pd.Timedelta("1 days 10:10:10.100000")
We can perform addition and subtraction functions on time deltas to get combined time delta and time delta difference respectively. We'll explain it with various examples.
t1 = pd.Timedelta("1 days 10:10:10.100000")
t2 = pd.Timedelta("2 days 10:10:10.100000")
t2 - t1
t1 + t2
Time deltas as very useful when you want to move your timestamps by a particular time delta. We can add and subtract time deltas from timestamp to get dates moved. We have explained it below with few examples.
pd.Timestamp("12 Jan 2020") + pd.Timedelta("1 days")
pd.Timestamp("Jan 2020") + pd.Timedelta("1 days")
pd.Timestamp("12 Jan 2020") - pd.Timedelta("1 days")
pd.Timestamp("12 Jan 2020") + pd.Timedelta("4H")
pd.Timestamp("12 Jan 2020") + pd.Timedelta("30min")
pd.Timestamp("12 Jan 2020") + pd.Timedelta("30 seconds")
We can add and subtract time deltas from date ranges as well and it'll move all values of date ranges by that much time delta. We'll explain it below with few examples.
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("1 days")
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") - pd.Timedelta("1 days")
pd.date_range(start="1-1-2020", end="1-5-2020", freq="D") + pd.Timedelta("4H30T30S")
Pandas provides function named timedelta_range() just like date_range() and period_range() to create range of time deltas. It lets us create list of time deltas by following almost the same format as date_range() and period_range(). We have explained below few examples of timedelta_range() usage.
pd.timedelta_range(start="1 day", periods=10)
pd.timedelta_range(start="1 day", periods=10, freq="30D")
pd.timedelta_range(start="1 day", periods=10, freq="10H")
pd.timedelta_range(start="1 day", end="2 day", freq="4H")
pd.timedelta_range(start="1 hour", end="2 hour", freq="10min")
pd.timedelta_range(start="1 min", end="5 min", freq="T")
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("1 days")
We can move time delta ranges by adding or subtracting time delta from it. We have explained it below with few examples.
pd.timedelta_range(start="1 day", periods=10) - pd.Timedelta("1 days")
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2 days")
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H")
pd.timedelta_range(start="1 day", periods=10) + pd.Timedelta("2D5H30min")
Pandas also provide a function named to_timedelta() which can be used to convert list of strings to time deltas. We can modify these time deltas by adding and subtracting other time deltas.
Below, we have explained few examples explaining how to convert list of strings to time deltas using pandas 'to_timedelta()' function.
pd.to_timedelta(["1 day","2 day", "3 day"])
pd.to_timedelta(["1 day 6 hour","2 day 6 hour", "3 day 6 hour"])
pd.to_timedelta(["1 hour","2 hour", "3 hour"])
pd.to_timedelta(["1 hour 20 min","2 hour 20 min", "3 hour 20 min"])
pd.to_timedelta(["10 min","20 min", "30 min"])
pd.to_timedelta(["1 second","2 second", "3 second"])
pd.to_timedelta(["1 millisecond","2 millisecond", "3 millisecond"])
pd.to_timedelta(["1 microsecond","2 microsecond", "3 microsecond"])
pd.to_timedelta(["1 nanosecond","2 nanosecond", "3 nanosecond"])
pd.to_timedelta(["1 day","2 day", "3 day"]) + pd.to_timedelta(["1 day","2 day", "3 day"])
pd.to_timedelta(["1 hour","2 hour", "3 hour"]) + pd.to_timedelta(["10 min","20 min", "30 min"])
pd.to_timedelta(["1 day","2 day", "3 day"]) +\
pd.to_timedelta(["1 hour","2 hour", "3 hour"]) +\
pd.to_timedelta(["10 min","20 min", "30 min"])
pd.to_timedelta(["1 hour","2 hour", "3 hour"]) - pd.to_timedelta(["10 min","20 min", "30 min"])
pd.to_timedelta(["1 day","2 day", "3 day"]) - pd.to_timedelta(["23 hour","22 hour", "21 hour"])
Pandas provide a Period() function to represent the time span. We'll need periods when we want to represent values that are the same throughout the period and do not change much. Period function lets us pass freq like Timestamp function and if we don't pass it then it'll detect it from date format passed. We'll explain the creation of various periods by various examples below.
pd.Period(value="1-1-2020")
pd.Period("1-2020")
pd.Period("2020")
pd.Period("1-1-2020 10:00:00")
pd.Period("1-1-2020 10:00")
pd.Period("1-1-2020 10")
pd.Period("1-1-2020 10:00:00.100000")
We can add and subtract time deltas from periods as well. We have explained it below with few examples.
pd.Period("1-1-2020 10:10") + pd.Timedelta("1 days")
pd.Period("1-1-2020 10:10:10") - pd.Timedelta("1 days")
pd.Period("1-1-2020 10") + pd.Timedelta("10H")
pd.Period("1-1-2020 10") - pd.Timedelta("10H")
Just like date_range() function, period_range() function lets us generate list of periods. This function has almost all parameters the same as that date_range() function. We have explained various ways to create the period range below.
pd.period_range(start="1-1-2020", periods=5)
pd.period_range(start="1-1-2020", periods=5, freq="4H")
pd.period_range(start="1-1-2020", periods=5, freq="M")
pd.period_range(start="1-1-2020", periods=5, freq="T")
pd.period_range(start="1-1-2020", periods=6, freq="12min")
pd.period_range(start="1-1-2020", periods=4, freq="5S")
We can add and subtract time delta from a list of periods the same way we did with date ranges. It'll move the list of periods by that much time delta amount. We have explained it below with few examples.
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1 days")
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("5 seconds")
pd.period_range(start="1-1-2020", periods=4, freq="5S") + pd.Timedelta("1min")
We can also subscript series based on various combinations as well. We can pass various date time formats to filter the list of ranges. We'll explain it below with few examples.
rng = pd.period_range(start="1-1-2020", periods=6, freq="D")
ser = pd.Series(data=range(6), index=rng)
ser
ser["1/1/2020":"1/4/2020"]
ser["1/3/2020":]
from datetime import datetime
ser[datetime(2020,1,3):]
We can also pass partial indexing like only year and month or only year and it'll filter all values matching those combinations. We have explained it as well below with few examples.
ser["2020-1"]
ser["2020"]
data = np.random.randn(5, 5)
date_range = pd.date_range(start="2020 Jan 1", periods=5)
df = pd.DataFrame(data, index=date_range)
df
df.index
df.index.to_period()
Pandas let us specify timezone when creating date ranges, timestamps, etc. Till now, we have created all date ranges and timestamps without any time stamp set. We'll now explore ways to set timezones and conversion from one timezone to another timezone. We can specify timezone by setting string value to argument tz of date_range() and Timestamp(). Python library pytz maintains a list of all available time zone names.
from pytz import common_timezones, all_timezones
print("Number of Common Timezones : ", len(common_timezones))
print("Number of All Timezones : ", len(all_timezones))
print("Difference between all timezones and common timezones : ", list(set(all_timezones) - set(common_timezones)))
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
rng.tz
rng = pd.date_range(start="1-1-2020", periods=5, freq="M", tz="US/Eastern")
rng.tz
ts = pd.Timestamp("1-1-2020")
print(ts)
ts.tz
ts = pd.Timestamp("1-1-2020", tz="Asia/Calcutta")
print(ts)
ts.tz
Pandas provide a function named tz_localize() to set a timezone for date ranges, and timestamps that do not have any timezone set previously. It returns a modified date range, and timestamp with time zone passed to tz_localize().
rng = pd.date_range(start="1-1-2020", periods=5, freq="M")
print(rng)
print("Timezone : ", rng.tz)
rng = rng.tz_localize("US/Eastern")
print(rng)
print("Timezone : ", rng.tz)
ts = pd.Timestamp("1-1-2020")
print(ts)
print("Timezone : ", ts.tz)
ts = ts.tz_localize("US/Central")
print(ts)
print("Timezone : ", ts.tz)
We can convert date ranges and timestamps from one timezone to another timezone using tz_convert() method. We can pass a new timezone to tz_convert() method and it'll return modified date range and timestamp with time modified according to new timezone. We'll explain its usage with a few examples below.
ts = pd.Timestamp("1-1-2020", tz="US/Central")
print(ts)
print("Timezone : ", ts.tz)
ts = ts.tz_convert("US/Eastern")
print(ts)
print("Timezone : ", ts.tz)
rng = pd.date_range(start="1-1-2020", periods=5, freq="D", tz="US/Eastern")
print(rng)
print("Timezone : ", rng.tz)
rng = rng.tz_convert("US/Central")
print(rng)
print("Timezone : ", rng.tz)
rng = rng.tz_convert("Asia/Calcutta")
print(rng)
print("Timezone : ", rng.tz)
rng = rng.tz_convert("Asia/Istanbul")
print(rng)
print("Timezone : ", rng.tz)
We can notice above that time has been moved when changed from one timezone to another. It takes care of daylight savings time as well.
This ends our small tutorial on various dates, timestamps, time deltas and periods creation functionalities available with pandas.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to