Updated On : Dec-10,2021 Time Investment : ~20 mins

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index¶

Pandas is the go-to library when for data analysis when working with tabular datasets. It is the best solution available for working with tabular datasets which fit in the main memory of the computer. Pandas provide two data structures for working with data (1. Series 2. DataFrame). Pandas let us give labels to each row by providing index values and name columns of data as well. These labels can then be used to filter pandas’ dataframe/series in a particular dimension. We can use index values to filter rows of a data frame as well as perform operations on them. If we don't provide index values then pandas will internally assign integers from 0 till the end of the data. It creates RangeIndex objects by default for both row labels and column labels. Internally pandas use various kinds of Index objects based on the label's data type to store labels for axes. We can access Index object for rows (axis=0) by calling '.index' property of dataframe and for columns (axis=1) by calling '.columns' property of dataframe. Pandas internally represent labels of both rows and columns using Index objects of various types based on the data type of labels.

The majority of uses case of using Pandas dataframe/series requires a single value per label. We generally have a single label for entry in a particular axis (single label for a particular row / single label for a column). But there can be situations where we want more than one value to be labels of row or column of data. This lets us represent high-dimensional data in our 2D data structure dataframe or 1D data structure series. This kind of indexing is generally referred to as Multi-Level Indexing where we have more than one label value per row/column. We have more than one level in indexing. The values inside a single label can be unrelated or they can be related to each other where the value at a lower level is a sub-value of higher-level value (E.g - In multi-level index,first-level values can be a year, second-level values can be months and third-level values can be the day. Here, lower-level values are sub-values of higher level ). When values in the index at lower levels are sub-values of higher-level values then the index is generally referred to as Hierarchical Index.

As a part of this tutorial, we'll be concentrating on how to create Multi-Level Index using pandas and how to index pandas dataframe if multi-level indexing is present for both rows and columns. Below we have listed important sections of our tutorial to give an overview of the material covered.

Important Sections of Tutorial¶

Ways to Create MultiIndex / Hierarchical Index
- MultiIndex
- Hierarchical Index
Creating Sample DataFrames with MultiIndex
- DataFrame with MultiIndex for Rows (axis=0)
- DataFrame with Hierarchical Index for Rows (axis=0)
- DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)
- DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)
Indexing DataFrame
- Normal Indexing using '.loc' Property
- Using 'slice()' Function
- Range Indexing
- Using 'xs()' for Indexing

Below we have imported pandas and printed the version of them that we have used in this tutorial. We have also imported numpy as we'll be using it to create data for our pandas dataframe.

import pandas as pd

print("Pandas Version : {}".format(pd.__version__))

Pandas Version : 1.3.4

import numpy as np

1. Ways to Create Multi-Level / Hierarchical Index

In this section, we'll explain how we can create MultiIndex object which is used by pandas to represent an index that has more than one value per label of data. We can use MultiIndex object to represent row labels as well as columns labels. Pandas provide 4 different methods which are available as factory methods from MultiIndex class to create MultiIndex objects.

from_tuples() - This method takes a list of tuples as input and creates a MultiIndex object from it. Each tuple represents one label that will uniquely identify one row/column.
from_arrays() - This method takes as input a list of arrays and creates a MultiIndex object from it. Each array should have the same length as a total number of labels. The first values from each input array will create the first label, the second values will create the second label, and so on.
from_product() - This method takes as input more than one iterator and performs cartesian product on input iterators to create labels.
from_frame() - This method takes as input another pandas dataframe and uses its values to create MultiIndex object.

MultiIndex¶

Below we have created two lists with a list of labels. We have then combined both lists using zip() python method. We'll be creating MultiIndex object using this list of tuples.

index1 = ["A", "B", "C", "D", "E"]
index2 = [1,2,3,4,5]

combined_idx = list(zip(index1, index2))

combined_idx

[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', 5)]

Below we have created MultiIndex object using from_tuples() method by providing a list of tuples that we created in the previous cell as input. We can also provide names to individual levels of MultiIndex object using names parameter.

combined_idx_obj = pd.MultiIndex.from_tuples(combined_idx, names=["Index1", "Index2"])

combined_idx_obj

MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created MultiIndex object using from_arrays() method by providing two lists we had declared earlier as input. We can notice that the results are exactly the same as from_tuples() method.

combined_idx_obj = pd.MultiIndex.from_arrays([index1, index2], names=["Index1", "Index2"])

combined_idx_obj

MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created a MultiIndex object using from_product() method by providing two lists we had defined earlier. We can notice that from_product() method multiplied all values of the first list with all values of the second list (cartesian product).

combined_idx_obj = pd.MultiIndex.from_product([index1, index2], names=["Index1", "Index2"])

combined_idx_obj

MultiIndex([('A', 1),
            ('A', 2),
            ('A', 3),
            ('A', 4),
            ('A', 5),
            ('B', 1),
            ('B', 2),
            ('B', 3),
            ('B', 4),
            ('B', 5),
            ('C', 1),
            ('C', 2),
            ('C', 3),
            ('C', 4),
            ('C', 5),
            ('D', 1),
            ('D', 2),
            ('D', 3),
            ('D', 4),
            ('D', 5),
            ('E', 1),
            ('E', 2),
            ('E', 3),
            ('E', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created a dataframe using two lists that we had defined earlier. We'll be using this dataframe to create MultiIndex object next.

df = pd.DataFrame(zip(index1,index2), columns=["Index1", "index2"])

df

	Index1	index2
0	A	1
1	B	2
2	C	3
3	D	4
4	E	5

Below, we have created a MultiIndex object using from_frame() method by providing the dataframe we created in the previous cell.

combined_idx_obj = pd.MultiIndex.from_frame(df)

combined_idx_obj

MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'index2'])

Hierarchical Indexing¶

In this section, we have created a MultiIndex object which represents a hierarchical indexing example. We have created MultiIndex object with 3 levels. We have given three lists as input to from_product() method. The first list is a single value list of the year, the second list is a 2-value list of months and the third list is integers in the range 1-31 representing days.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

combined_idx_obj[:5]

MultiIndex([(2021, 1, 1),
            (2021, 1, 2),
            (2021, 1, 3),
            (2021, 1, 4),
            (2021, 1, 5)],
           names=['Year', 'Month', 'Day'])

We can retrieve unique values per level of MultiIndex object using levels property of MultiIndex object.

combined_idx_obj.levels

FrozenList([[2021], [1, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]])

We can retrieve a list of values per level using get_level_values() method by providing integer or level name as input.

combined_idx_obj.get_level_values(0)

Int64Index([2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021],
           dtype='int64', name='Year')

combined_idx_obj.get_level_values("Month")

Int64Index([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           dtype='int64', name='Month')

2. Creating Sample DataFrames with MultiIndex ¶

In this section, we'll create pandas DataFrames which will have either index or column labels or both represented with MultiIndex object. We'll be using various dataframes created in this section, later on, to explain indexing dataframe with multi-level indexing.

np.random.seed(123)

DataFrame with MultiIndex for Rows (axis=0)¶

Below we have created aMultiIndex object using from_arrays() method by providing two lists to it. Then in the next cell, we have created a pandas dataframe with random numbers whose index is set as MultiIndex object we created.

index1 = ["A", "B", "C", "D", "E"]
index2 = [1,2,3,4,5]

combined_idx_obj = pd.MultiIndex.from_arrays([index1, index2], names=["Index1", "Index2"])

combined_idx_obj

MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

data= np.random.rand(len(combined_idx_obj), 5)

df1 = pd.DataFrame(data=data, columns=["A1", "A2", "A3", "A4", "A5"], index=combined_idx_obj)

df1

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
C	3	0.343178	0.729050	0.438572	0.059678	0.398044
D	4	0.737995	0.182492	0.175452	0.531551	0.531828
E	5	0.634401	0.849432	0.724455	0.611024	0.722443

DataFrame with Hierarchical Index for Rows (axis=0)¶

In this section, we have created a hierarchical index using from_product() method by providing three lists to it. We have then created a pandas dataframe whose index is set as MultiIndex object representing hierarchical index.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

combined_idx_obj[:5]

MultiIndex([(2021, 1, 1),
            (2021, 1, 2),
            (2021, 1, 3),
            (2021, 1, 4),
            (2021, 1, 5)],
           names=['Year', 'Month', 'Day'])

data = np.random.rand(len(combined_idx_obj), 5)

df2 = pd.DataFrame(data=data, index=combined_idx_obj, columns=["C1", "C2", "C3", "C4", "C5"])

df2

			C1	C2	C3	C4	C5
Year	Month	Day
2021	1	1	0.322959	0.361789	0.228263	0.293714	0.630976
		2	0.092105	0.433701	0.430863	0.493685	0.425830
		3	0.312261	0.426351	0.893389	0.944160	0.501837
		4	0.623953	0.115618	0.317285	0.414826	0.866309
		5	0.250455	0.483034	0.985560	0.519485	0.612895
	...	...	...	...	...	...	...
	2	27	0.572147	0.660952	0.298245	0.418627	0.453089
		28	0.932351	0.587494	0.948252	0.556035	0.500561
		29	0.003532	0.480889	0.927455	0.198366	0.052091
		30	0.406779	0.372396	0.857153	0.026611	0.920149
		31	0.680903	0.904226	0.607529	0.811953	0.335544

62 rows × 5 columns

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)¶

In this section, we have created examples where both row and column headings will be represented using MultiIndex object.

First, we have created hierarchical MultiIndex just like our previous example to represent rows of dataframe. Then, we have created another hierarchical MultiIndex object using from_product() method which will be used to represent column labels.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

columns_idx = pd.MultiIndex.from_product([["Location1","Location2"],
                                          ["Item1", "Item2", "Item3", "Item4", "Item5"]],
                                         names=["Location", "Item"]
                                        )

columns_idx

MultiIndex([('Location1', 'Item1'),
            ('Location1', 'Item2'),
            ('Location1', 'Item3'),
            ('Location1', 'Item4'),
            ('Location1', 'Item5'),
            ('Location2', 'Item1'),
            ('Location2', 'Item2'),
            ('Location2', 'Item3'),
            ('Location2', 'Item4'),
            ('Location2', 'Item5')],
           names=['Location', 'Item'])

Below, we have created a pandas dataframe of random numbers whose index and columns are represented using MultiIndex objects.

data = np.random.randint(1, 1000, size=(len(combined_idx_obj), len(columns_idx)))

df3 = pd.DataFrame(data=data, index=combined_idx_obj, columns=columns_idx)

df3

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444	636	469	921	532	117
		2	154	127	519	451	312	72	501	94	523	456
		3	249	485	401	584	175	940	192	414	555	235
		4	528	119	260	341	50	546	766	744	291	64
		5	269	875	539	456	937	810	925	254	656	235
	...	...	...	...	...	...	...	...	...	...	...	...
	2	27	921	145	123	115	878	931	209	568	284	706
		28	546	65	623	896	643	443	824	22	426	388
		29	463	100	743	630	780	293	752	553	58	63
		30	327	479	901	91	345	529	924	887	923	437
		31	794	689	93	316	539	593	637	38	336	905

62 rows × 10 columns

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)¶

In this section, we have created another example where we are using MultiIndex object to represent labels of rows and columns. We have this time used a list of dates as a second-level index for rows.

locations = ["Location1", "Location2"]
rng = pd.date_range(start="2021-1-1", end="2021-2-28", freq="D")



rows_idx = pd.MultiIndex.from_product([locations,rng], names=["Location", "Date"])

columns_idx = pd.MultiIndex.from_product([["Category1","Category2"],
                                          ["Item1", "Item2", "Item3", "Item4", "Item5"]],
                                         names=["Category", "Item"])

columns_idx

MultiIndex([('Category1', 'Item1'),
            ('Category1', 'Item2'),
            ('Category1', 'Item3'),
            ('Category1', 'Item4'),
            ('Category1', 'Item5'),
            ('Category2', 'Item1'),
            ('Category2', 'Item2'),
            ('Category2', 'Item3'),
            ('Category2', 'Item4'),
            ('Category2', 'Item5')],
           names=['Category', 'Item'])

data = np.random.randint(1, 1000, size=(len(rows_idx), len(columns_idx)))

df4 = pd.DataFrame(data=data, index=rows_idx, columns=columns_idx)

df4

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877
	2021-01-02	508	511	134	165	126	807	644	945	421	202
	2021-01-03	906	113	867	520	535	113	244	839	373	191
	2021-01-04	578	294	660	927	670	176	729	10	700	81
	2021-01-05	905	875	826	928	88	400	110	268	419	245
...	...	...	...	...	...	...	...	...	...	...	...
Location2	2021-02-24	60	204	313	832	600	763	24	865	658	360
	2021-02-25	21	726	979	578	721	488	115	55	703	964
	2021-02-26	74	299	910	78	922	779	601	479	159	927
	2021-02-27	811	612	506	345	159	87	682	654	562	273
	2021-02-28	326	261	851	102	939	332	192	931	783	312

118 rows × 10 columns

Below, we have retrieved single level from MultiIndex object using get_level_values() method and we can notice that it's DatetimeIndex object. Internally, MultiIndex object is composed of more than one Index object.

df4.index.get_level_values(1)

DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10',
               ...
               '2021-02-19', '2021-02-20', '2021-02-21', '2021-02-22',
               '2021-02-23', '2021-02-24', '2021-02-25', '2021-02-26',
               '2021-02-27', '2021-02-28'],
              dtype='datetime64[ns]', name='Date', length=118, freq=None)

3. Indexing DataFrame ¶

In this section, we'll explain how we can perform indexing on a dataframe whose index or column labels are represented using MultiIndex objects. We'll explain various ways of indexing multi-level index in this section. We'll be using '.loc' property of pandas dataframe to perform indexing as it accepts actual index values which can be of any type to perform the indexing. We won't be covering indexing using '.iloc' property which accepts integers for indexing. We'll also explain indexing using xs() function.

We can provide labels of 1st levels of MultiIndex object directly to '.loc' property and it'll retrieve all rows/columns that satisfy that 1st level.

Below we have retrieved rows of pandas dataframe whose 1st level value for MultiIndex object is 'A'.

df1.loc["A"]

	A1	A2	A3	A4	A5
Index2
1	0.696469	0.286139	0.226851	0.551315	0.719469

Below, we have provided a list of 1st level values to '.loc' property to retrieve rows where these three values are 1st level values of the index.

df1.loc[["A","B","D"]]

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
D	4	0.737995	0.182492	0.175452	0.531551	0.531828

Below, we have provided indexing values for both rows and columns to '.loc' property. The values provided for index is the same as our previous example.

df1.loc[["A","B","D"], ["A1","A3","A5"]]

		A1	A3	A5
Index1	Index2
A	1	0.696469	0.226851	0.719469
B	2	0.423106	0.684830	0.392118
D	4	0.737995	0.175452	0.531828

Below, we have created an example to explain how we can give a tuple of values to an index dataframe. Each tuple has two values representing values for both levels of MultiIndex object. This kind of indexing will retrieve rows/columns which exactly match the combinations of values provided.

df1.loc[[("A",1),("B",2),("D",4)]]

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
D	4	0.737995	0.182492	0.175452	0.531551	0.531828

Below we have again used only the 1st level value of MultiIndex object to the index dataframe.

df2.loc[2021]

		C1	C2	C3	C4	C5
Month	Day
1	1	0.322959	0.361789	0.228263	0.293714	0.630976
	2	0.092105	0.433701	0.430863	0.493685	0.425830
	3	0.312261	0.426351	0.893389	0.944160	0.501837
	4	0.623953	0.115618	0.317285	0.414826	0.866309
	5	0.250455	0.483034	0.985560	0.519485	0.612895
...	...	...	...	...	...	...
2	27	0.572147	0.660952	0.298245	0.418627	0.453089
	28	0.932351	0.587494	0.948252	0.556035	0.500561
	29	0.003532	0.480889	0.927455	0.198366	0.052091
	30	0.406779	0.372396	0.857153	0.026611	0.920149
	31	0.680903	0.904226	0.607529	0.811953	0.335544

62 rows × 5 columns

Below, we have provided a two-value tuple to index dataframe which has 3-level index.

df2.loc[(2021,1),]

	C1	C2	C3	C4	C5
Day
1	0.322959	0.361789	0.228263	0.293714	0.630976
2	0.092105	0.433701	0.430863	0.493685	0.425830
3	0.312261	0.426351	0.893389	0.944160	0.501837
4	0.623953	0.115618	0.317285	0.414826	0.866309
5	0.250455	0.483034	0.985560	0.519485	0.612895
6	0.120629	0.826341	0.603060	0.545068	0.342764
7	0.304121	0.417022	0.681301	0.875457	0.510422
8	0.669314	0.585937	0.624904	0.674689	0.842342
9	0.083195	0.763683	0.243666	0.194223	0.572457
10	0.095713	0.885327	0.627249	0.723416	0.016129
11	0.594432	0.556785	0.158960	0.153071	0.695530
12	0.318766	0.691970	0.554383	0.388951	0.925132
13	0.841670	0.357398	0.043591	0.304768	0.398186
14	0.704959	0.995358	0.355915	0.762548	0.593177
15	0.691702	0.151127	0.398876	0.240856	0.343456
16	0.513128	0.666625	0.105908	0.130895	0.321981
17	0.661564	0.846506	0.553257	0.854452	0.384838
18	0.316788	0.354265	0.171082	0.829113	0.338671
19	0.552370	0.578551	0.521533	0.002688	0.988345
20	0.905342	0.207636	0.292489	0.520010	0.901911
21	0.983631	0.257542	0.564359	0.806969	0.394370
22	0.731073	0.161069	0.600699	0.865864	0.983522
23	0.079366	0.428347	0.204543	0.450636	0.547764
24	0.093327	0.296861	0.927584	0.569004	0.457412
25	0.753526	0.741862	0.048579	0.708697	0.839243
26	0.165938	0.780998	0.286537	0.306470	0.665261
27	0.111392	0.664872	0.887857	0.696311	0.440328
28	0.438214	0.765096	0.565642	0.084904	0.582671
29	0.814844	0.337066	0.927577	0.750717	0.574064
30	0.751644	0.079149	0.859389	0.821504	0.909872
31	0.128631	0.081780	0.138416	0.399379	0.424307

Below we have provided indexing values for all three levels to index the dataframe.

df2.loc[(2021,1, 1),]

C1    0.322959
C2    0.361789
C3    0.228263
C4    0.293714
C5    0.630976
Name: (2021, 1, 1), dtype: float64

In the below cell, we have explained an example where the second level of the index has date-time values. We can provide date-time values as either string or as date-time objects.

df4.loc[("Location1","2021-1-1"),]

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877

Using slice() Function¶

The slice(start,end[,step]) function is normally used to create range of values. It has the same format as integer indexing on numpy arrays but slice() can also accept any data type as input and is used to represent a range of values for that data type based on values present in an index of dataframe/series.

We can call slice() function with just None value and it will represent all values in a particular level of MultiIndex object.

Below we have filtered our dataframe with a 2-level index. We have provided a tuple of two values where the first value is slice(None) and the second value is a list of three integers. This will inform the dataframe to take all entries where second-level values belong to one of the provided three integers. It'll consider all first-level values which have specified second-level values.

df1.loc[(slice(None), [1,2,3]),]

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
C	3	0.343178	0.729050	0.438572	0.059678	0.398044

In the below cell, we have filtered our data set at row level using the same logic as our previous example. This time, we have provided filters for columns as well.

row_filter = (slice(None), [1,2,3])
column_filter = ["A1","A3","A5"]

df1.loc[row_filter, column_filter]

		A1	A3	A5
Index1	Index2
A	1	0.696469	0.226851	0.719469
B	2	0.423106	0.684830	0.392118
C	3	0.343178	0.438572	0.398044

In the below cell, we have filtered rows of our dataframe which has a three-level index. For the first level, we have provided a single value. For the second level, we have provided slice(None) which will consider all values based on the values of other levels. For the third level, we have provided five values. This will consider all second-level values where the first-level value is 2021 and the third-level value is one of the five input integers.

df3.loc[(2021, slice(None), [1,2,3,4,5]),]

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444	636	469	921	532	117
		2	154	127	519	451	312	72	501	94	523	456
		3	249	485	401	584	175	940	192	414	555	235
		4	528	119	260	341	50	546	766	744	291	64
		5	269	875	539	456	937	810	925	254	656	235
	2	1	480	18	354	30	620	23	48	439	212	494
		2	639	506	23	552	576	72	398	380	471	878
		3	78	2	378	4	728	243	868	265	528	864
		4	160	27	15	324	228	793	15	764	12	71
		5	740	408	582	112	169	572	493	287	405	60

Below, we have created another example demonstrating usage of slice() function.

df3.loc[(2021, [2,], slice(None)),]

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	2	1	480	18	354	30	620	23	48	439	212	494
		2	639	506	23	552	576	72	398	380	471	878
		3	78	2	378	4	728	243	868	265	528	864
		4	160	27	15	324	228	793	15	764	12	71
		5	740	408	582	112	169	572	493	287	405	60
		6	75	976	462	985	270	358	338	270	408	499
		7	712	237	537	994	877	108	668	88	18	57
		8	866	682	835	140	420	325	83	76	859	273
		9	171	92	40	278	704	394	361	646	17	332
		10	447	710	491	470	946	80	196	921	33	761
		11	1	591	128	145	427	442	147	566	370	339
		12	596	769	215	656	664	724	975	192	698	664
		13	954	101	960	169	161	224	442	893	696	863
		14	783	833	881	396	163	549	240	542	730	11
		15	956	800	770	394	124	174	534	841	118	55
		16	443	286	693	857	789	850	788	232	464	369
		17	167	803	541	366	568	990	372	951	949	176
		18	579	951	288	826	334	218	325	28	840	621
		19	752	481	740	828	781	468	274	947	50	980
		20	299	737	853	302	24	988	5	926	299	821
		21	920	615	398	740	420	154	759	781	232	789
		22	346	354	552	206	981	811	571	561	175	754
		23	147	908	292	378	862	876	33	620	383	343
		24	830	937	461	220	220	787	101	543	562	682
		25	871	17	732	708	774	316	787	698	215	552
		26	380	813	42	838	152	662	53	725	334	152
		27	921	145	123	115	878	931	209	568	284	706
		28	546	65	623	896	643	443	824	22	426	388
		29	463	100	743	630	780	293	752	553	58	63
		30	327	479	901	91	345	529	924	887	923	437
		31	794	689	93	316	539	593	637	38	336	905

Below, we have created an example, where we are indexing a dataframe whose both rows and columns labels are MultiIndex objects. For filtering rows, we have used the same tuple of three values that we had used in one of our earlier examples. For filtering columns, we have used a tuple of two values where the first value is a single string and the second value is a list of three strings.

df3.loc[(2021, slice(None), [1,2,3,4,5]), (["Location1"], ["Item1","Item2", "Item3"])]

		Location	Location1
		Item	Item1	Item2	Item3
Year	Month	Day
2021	1	1	24	531	61
		2	154	127	519
		3	249	485	401
		4	528	119	260
		5	269	875	539
	2	1	480	18	354
		2	639	506	23
		3	78	2	378
		4	160	27	15
		5	740	408	582

In the below cell, we have tried to reproduce our example from the previous cell but this time instead of providing 5 integers for the last level of filtering rows, we have provided slice(1, 5) function. It'll have the same effect as our previous example.

df3.loc[(2021,slice(None), slice(1,5)), (["Location1"], ["Item1","Item2", "Item3"])]

		Location	Location1
		Item	Item1	Item2	Item3
Year	Month	Day
2021	1	1	24	531	61
		2	154	127	519
		3	249	485	401
		4	528	119	260
		5	269	875	539
	2	1	480	18	354
		2	639	506	23
		3	78	2	378
		4	160	27	15
		5	740	408	582

In the below cell, we have filtered rows of our dataframe where we had datetime index as a part of MultiIndex object.

df4.loc[(slice(None), ["2021-1-1", "2021-1-2", "2021-1-3"]),]

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877
	2021-01-02	508	511	134	165	126	807	644	945	421	202
	2021-01-03	906	113	867	520	535	113	244	839	373	191
Location2	2021-01-01	417	625	322	117	488	993	386	429	910	771
	2021-01-02	218	728	381	524	691	585	4	645	53	581
	2021-01-03	588	173	651	860	638	539	502	726	771	533

Range Indexing¶

In this section, we'll explain how we can specify a range of values using ':' operator for filtering rows/columns of our dataframe.

Below we have filtered our first dataframe by providing value '"A":"C"' for indexing. This informs the '.loc' property that keep rows of dataframe where first level index values ranges from 'A' to 'C' which is three values ['A','B','C'].

df1.loc["A":"C",]

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
C	3	0.343178	0.729050	0.438572	0.059678	0.398044

Below we have created another example where we have provided tuples to filter rows of dataframe. This will keep all entries in the dataframe starting from the first tuple index values till the last tuple index values.

df1.loc[("A",1): ("C",3),]

		A1	A2	A3	A4	A5
Index1	Index2
A	1	0.696469	0.286139	0.226851	0.551315	0.719469
B	2	0.423106	0.980764	0.684830	0.480932	0.392118
C	3	0.343178	0.729050	0.438572	0.059678	0.398044

In the below cell, we have created another example demonstrating the usage of range filtering where we have filtered rows of dataframe as well as columns of dataframe using range operator (':').

df2.loc[(2021,1,1):(2021,1,15), "C3":"C5"]

			C3	C4	C5
Year	Month	Day
2021	1	1	0.228263	0.293714	0.630976
		2	0.430863	0.493685	0.425830
		3	0.893389	0.944160	0.501837
		4	0.317285	0.414826	0.866309
		5	0.985560	0.519485	0.612895
		6	0.603060	0.545068	0.342764
		7	0.681301	0.875457	0.510422
		8	0.624904	0.674689	0.842342
		9	0.243666	0.194223	0.572457
		10	0.627249	0.723416	0.016129
		11	0.158960	0.153071	0.695530
		12	0.554383	0.388951	0.925132
		13	0.043591	0.304768	0.398186
		14	0.355915	0.762548	0.593177
		15	0.398876	0.240856	0.343456

In the below cell, we have created another example demonstrating the usage of the range operator for filtering rows and columns of the dataframe. This time we have used tuples for columns as well.

df3.loc[(2021,1,1):(2021,1,15), ("Location1", "Item4"): ("Location2","Item1")]

		Location	Location1		Location2
		Item	Item4	Item5	Item1
Year	Month	Day
2021	1	1	112	444	636
		2	451	312	72
		3	584	175	940
		4	341	50	546
		5	456	937	810
		6	805	360	744
		7	726	9	666
		8	95	890	789
		9	692	448	654
		10	755	597	703
		11	606	560	359
		12	535	193	661
		13	582	942	7
		14	138	721	872
		15	184	288	160

In the below cell, we have used range indexing on our dataframe where values of one level are of datetime data type.

df4.loc["Location1", "2021-1-1":"2021-1-5", :]

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877
	2021-01-02	508	511	134	165	126	807	644	945	421	202
	2021-01-03	906	113	867	520	535	113	244	839	373	191
	2021-01-04	578	294	660	927	670	176	729	10	700	81
	2021-01-05	905	875	826	928	88	400	110	268	419	245

In the below cell, we have created one more example demonstrating usage of range operator to filter rows of dataframe where one level values are of datetime data type. We have provided datetime values as strings.

df4.loc["Location1":"Location2", "2021-1-1":"2021-1-5", :]

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877
	2021-01-02	508	511	134	165	126	807	644	945	421	202
	2021-01-03	906	113	867	520	535	113	244	839	373	191
	2021-01-04	578	294	660	927	670	176	729	10	700	81
	2021-01-05	905	875	826	928	88	400	110	268	419	245
Location2	2021-01-01	417	625	322	117	488	993	386	429	910	771
	2021-01-02	218	728	381	524	691	585	4	645	53	581
	2021-01-03	588	173	651	860	638	539	502	726	771	533
	2021-01-04	651	28	925	593	458	975	625	128	556	413
	2021-01-05	580	999	408	621	152	654	632	794	203	259

Using xs() for Indexing¶

In this section, we'll explain how we can use xs() function available from the pandas dataframe to filter rows/columns of the dataframe.

xs(key,axis=0,level=None) - This method takes as input labels of rows/columns along with axis value (0-rows,1-columns) and level names of MultiIndex object and returns dataframe with specified labels present in it.
- The key parameter takes as input single label or tuple of labels from our multi-level indexes. The rows/columns which have these label values will be kept and all others will be dropped.
- The axis method accepts either 0 (rows) or 1 (columns) as value. The value of 0 will consider label values provided to key parameter across index of dataframe and value of '1' will consider values provided to key parameter across columns of dataframe.
- The level accepts a single value or tuple of values representing level names for multi-level index.

The benefit of using this method is that we can provide information about individual levels and all other level labels will be handled internally without we need to explicitly handle them.

Below we have retrieved rows from our dataframe where values for level 'Day' is 1. There are two rows in the dataframe where the value of level 'Day' is 1.

df2.xs(1, level="Day")

		C1	C2	C3	C4	C5
Year	Month
2021	1	0.322959	0.361789	0.228263	0.293714	0.630976
2021	2	0.562218	0.122244	0.201400	0.811644	0.467988

We can also provide slice() function call to xs() method.

Below we have retrieved rows from our dataframe where values for level 'Day' are in the range [1-3]. There are six values that satisfies this condition because we have 'Day' values [1,2,3] present for month values 1 and 2 both.

df2.xs(slice(1,3), level="Day")

		C1	C2	C3	C4	C5
Year	Month
2021	1	0.322959	0.361789	0.228263	0.293714	0.630976
	1	0.092105	0.433701	0.430863	0.493685	0.425830
	1	0.312261	0.426351	0.893389	0.944160	0.501837
	2	0.562218	0.122244	0.201400	0.811644	0.467988
	2	0.807938	0.007426	0.551593	0.931932	0.582175
	2	0.206096	0.717758	0.378986	0.668384	0.029320

In the below cell, we have explained how we can use xs() method with columns of our dataframe.

We have filtered columns of our dataframe where level named 'Location' has values 'Location1'. There are 5 columns that satisfy this condition.

df3.xs("Location1", level="Location", axis=1)

		Item	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444
		2	154	127	519	451	312
		3	249	485	401	584	175
		4	528	119	260	341	50
		5	269	875	539	456	937
	...	...	...	...	...	...	...
	2	27	921	145	123	115	878
		28	546	65	623	896	643
		29	463	100	743	630	780
		30	327	479	901	91	345
		31	794	689	93	316	539

62 rows × 5 columns

In the below cell, we have filtered columns of our dataframe using xs() function where label values are provided by calling slice() function. We have filtered columns of the dataframe where labels of level 'Item' are one of ['Item1','Item2','Item3].

df3.xs(slice("Item1", "Item3"), level="Item", axis=1)

		Location	Location1	Location1	Location1	Location2	Location2	Location2
Year	Month	Day
2021	1	1	24	531	61	636	469	921
		2	154	127	519	72	501	94
		3	249	485	401	940	192	414
		4	528	119	260	546	766	744
		5	269	875	539	810	925	254
	...	...	...	...	...	...	...	...
	2	27	921	145	123	931	209	568
		28	546	65	623	443	824	22
		29	463	100	743	293	752	553
		30	327	479	901	529	924	887
		31	794	689	93	593	637	38

62 rows × 6 columns

In the below cell, we have created another example demonstrating usage of xs() function. This time we have explained how we can give labels for more than one level. We have provided values for three levels of our dataframe and only one row satisfies those labels.

df3.xs((2021,1,1), level=("Year", "Month","Day"))

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444	636	469	921	532	117

In the below cell, we have created another example demonstrating how we can provide labels for multiple levels to xs() method.

df4.xs(("Location1","2021-1-1"), level=("Location", "Date"))

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877

In the below cell, we have again created an example demonstrating how to give labels for multiple levels to xs() method but this time we have used slice() call as one of the level's labels.

df3.xs((2021,slice(None),1), level=("Year", "Month","Day"))

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444	636	469	921	532	117
2021	2	1	480	18	354	30	620	23	48	439	212	494

In the next cell, we have created another example demonstrating usage of xs() method. This time we have used two slice() function call. One of the calls refers to all labels at that level and another call refers to the range of values.

df3.xs((2021,slice(None),slice(1,5)), level=("Year", "Month","Day"))

		Location	Location1					Location2
		Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Year	Month	Day
2021	1	1	24	531	61	112	444	636	469	921	532	117
		2	154	127	519	451	312	72	501	94	523	456
		3	249	485	401	584	175	940	192	414	555	235
		4	528	119	260	341	50	546	766	744	291	64
		5	269	875	539	456	937	810	925	254	656	235
	2	1	480	18	354	30	620	23	48	439	212	494
		2	639	506	23	552	576	72	398	380	471	878
		3	78	2	378	4	728	243	868	265	528	864
		4	160	27	15	324	228	793	15	764	12	71
		5	740	408	582	112	169	572	493	287	405	60

In our last example, we have explained how we can use xs() function when one of the level values is of datetime data type.

df4.xs(("Location1",slice("2021-1-1", "2021-1-5")), level=("Location", "Date"))

	Category	Category1					Category2
	Item	Item1	Item2	Item3	Item4	Item5	Item1	Item2	Item3	Item4	Item5
Location	Date
Location1	2021-01-01	800	769	521	306	455	748	933	744	755	877
	2021-01-02	508	511	134	165	126	807	644	945	421	202
	2021-01-03	906	113	867	520	535	113	244	839	373	191
	2021-01-04	578	294	660	927	670	176	729	10	700	81
	2021-01-05	905	875	826	928	88	400	110	268	419	245

This ends our small tutorial explaining how we can create, index, and work with multi-level /hierarchical indexing (MultiIndex) with pandas dataframe. Please feel free to let us know your views in the comments section.

References¶

MultiIndex/Hierarchical Indexing

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

pandas-dataframe, multiindex, hierarchical-index

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index¶

Important Sections of Tutorial¶

1. Ways to Create Multi-Level / Hierarchical Index

MultiIndex¶

Hierarchical Indexing¶

2. Creating Sample DataFrames with MultiIndex ¶

DataFrame with MultiIndex for Rows (axis=0)¶

DataFrame with Hierarchical Index for Rows (axis=0)¶

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)¶

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)¶

3. Indexing DataFrame ¶

Using slice() Function¶

Range Indexing¶

Using xs() for Indexing¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription