Updated On : Dec-10,2021 Time Investment : ~20 mins

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index

Pandas is the go-to library when for data analysis when working with tabular datasets. It is the best solution available for working with tabular datasets which fit in the main memory of the computer. Pandas provide two data structures for working with data (1. Series 2. DataFrame). Pandas let us give labels to each row by providing index values and name columns of data as well. These labels can then be used to filter pandas’ dataframe/series in a particular dimension. We can use index values to filter rows of a data frame as well as perform operations on them. If we don't provide index values then pandas will internally assign integers from 0 till the end of the data. It creates RangeIndex objects by default for both row labels and column labels. Internally pandas use various kinds of Index objects based on the label's data type to store labels for axes. We can access Index object for rows (axis=0) by calling '.index' property of dataframe and for columns (axis=1) by calling '.columns' property of dataframe. Pandas internally represent labels of both rows and columns using Index objects of various types based on the data type of labels.

The majority of uses case of using Pandas dataframe/series requires a single value per label. We generally have a single label for entry in a particular axis (single label for a particular row / single label for a column). But there can be situations where we want more than one value to be labels of row or column of data. This lets us represent high-dimensional data in our 2D data structure dataframe or 1D data structure series. This kind of indexing is generally referred to as Multi-Level Indexing where we have more than one label value per row/column. We have more than one level in indexing. The values inside a single label can be unrelated or they can be related to each other where the value at a lower level is a sub-value of higher-level value (E.g - In multi-level index,first-level values can be a year, second-level values can be months and third-level values can be the day. Here, lower-level values are sub-values of higher level ). When values in the index at lower levels are sub-values of higher-level values then the index is generally referred to as Hierarchical Index.

As a part of this tutorial, we'll be concentrating on how to create Multi-Level Index using pandas and how to index pandas dataframe if multi-level indexing is present for both rows and columns. Below we have listed important sections of our tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Ways to Create MultiIndex / Hierarchical Index
    • MultiIndex
    • Hierarchical Index
  2. Creating Sample DataFrames with MultiIndex
    • DataFrame with MultiIndex for Rows (axis=0)
    • DataFrame with Hierarchical Index for Rows (axis=0)
    • DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)
    • DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)
  3. Indexing DataFrame
    • Normal Indexing using '.loc' Property
    • Using 'slice()' Function
    • Range Indexing
    • Using 'xs()' for Indexing

Below we have imported pandas and printed the version of them that we have used in this tutorial. We have also imported numpy as we'll be using it to create data for our pandas dataframe.

import pandas as pd

print("Pandas Version : {}".format(pd.__version__))
Pandas Version : 1.3.4
import numpy as np

1. Ways to Create Multi-Level / Hierarchical Index

In this section, we'll explain how we can create MultiIndex object which is used by pandas to represent an index that has more than one value per label of data. We can use MultiIndex object to represent row labels as well as columns labels. Pandas provide 4 different methods which are available as factory methods from MultiIndex class to create MultiIndex objects.

  1. from_tuples() - This method takes a list of tuples as input and creates a MultiIndex object from it. Each tuple represents one label that will uniquely identify one row/column.
  2. from_arrays() - This method takes as input a list of arrays and creates a MultiIndex object from it. Each array should have the same length as a total number of labels. The first values from each input array will create the first label, the second values will create the second label, and so on.
  3. from_product() - This method takes as input more than one iterator and performs cartesian product on input iterators to create labels.
  4. from_frame() - This method takes as input another pandas dataframe and uses its values to create MultiIndex object.

MultiIndex

Below we have created two lists with a list of labels. We have then combined both lists using zip() python method. We'll be creating MultiIndex object using this list of tuples.

index1 = ["A", "B", "C", "D", "E"]
index2 = [1,2,3,4,5]

combined_idx = list(zip(index1, index2))

combined_idx
[('A', 1), ('B', 2), ('C', 3), ('D', 4), ('E', 5)]

Below we have created MultiIndex object using from_tuples() method by providing a list of tuples that we created in the previous cell as input. We can also provide names to individual levels of MultiIndex object using names parameter.

combined_idx_obj = pd.MultiIndex.from_tuples(combined_idx, names=["Index1", "Index2"])

combined_idx_obj
MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created MultiIndex object using from_arrays() method by providing two lists we had declared earlier as input. We can notice that the results are exactly the same as from_tuples() method.

combined_idx_obj = pd.MultiIndex.from_arrays([index1, index2], names=["Index1", "Index2"])

combined_idx_obj
MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created a MultiIndex object using from_product() method by providing two lists we had defined earlier. We can notice that from_product() method multiplied all values of the first list with all values of the second list (cartesian product).

combined_idx_obj = pd.MultiIndex.from_product([index1, index2], names=["Index1", "Index2"])

combined_idx_obj
MultiIndex([('A', 1),
            ('A', 2),
            ('A', 3),
            ('A', 4),
            ('A', 5),
            ('B', 1),
            ('B', 2),
            ('B', 3),
            ('B', 4),
            ('B', 5),
            ('C', 1),
            ('C', 2),
            ('C', 3),
            ('C', 4),
            ('C', 5),
            ('D', 1),
            ('D', 2),
            ('D', 3),
            ('D', 4),
            ('D', 5),
            ('E', 1),
            ('E', 2),
            ('E', 3),
            ('E', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])

Below, we have created a dataframe using two lists that we had defined earlier. We'll be using this dataframe to create MultiIndex object next.

df = pd.DataFrame(zip(index1,index2), columns=["Index1", "index2"])

df
Index1 index2
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5

Below, we have created a MultiIndex object using from_frame() method by providing the dataframe we created in the previous cell.

combined_idx_obj = pd.MultiIndex.from_frame(df)

combined_idx_obj
MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'index2'])

Hierarchical Indexing

In this section, we have created a MultiIndex object which represents a hierarchical indexing example. We have created MultiIndex object with 3 levels. We have given three lists as input to from_product() method. The first list is a single value list of the year, the second list is a 2-value list of months and the third list is integers in the range 1-31 representing days.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

combined_idx_obj[:5]
MultiIndex([(2021, 1, 1),
            (2021, 1, 2),
            (2021, 1, 3),
            (2021, 1, 4),
            (2021, 1, 5)],
           names=['Year', 'Month', 'Day'])

We can retrieve unique values per level of MultiIndex object using levels property of MultiIndex object.

combined_idx_obj.levels
FrozenList([[2021], [1, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]])

We can retrieve a list of values per level using get_level_values() method by providing integer or level name as input.

combined_idx_obj.get_level_values(0)
Int64Index([2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021,
            2021, 2021, 2021, 2021, 2021, 2021, 2021],
           dtype='int64', name='Year')
combined_idx_obj.get_level_values("Month")
Int64Index([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
            1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
            2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
           dtype='int64', name='Month')

2. Creating Sample DataFrames with MultiIndex

In this section, we'll create pandas DataFrames which will have either index or column labels or both represented with MultiIndex object. We'll be using various dataframes created in this section, later on, to explain indexing dataframe with multi-level indexing.

np.random.seed(123)

DataFrame with MultiIndex for Rows (axis=0)

Below we have created aMultiIndex object using from_arrays() method by providing two lists to it. Then in the next cell, we have created a pandas dataframe with random numbers whose index is set as MultiIndex object we created.

index1 = ["A", "B", "C", "D", "E"]
index2 = [1,2,3,4,5]

combined_idx_obj = pd.MultiIndex.from_arrays([index1, index2], names=["Index1", "Index2"])

combined_idx_obj
MultiIndex([('A', 1),
            ('B', 2),
            ('C', 3),
            ('D', 4),
            ('E', 5)],
           names=['Index1', 'Index2'])
data= np.random.rand(len(combined_idx_obj), 5)

df1 = pd.DataFrame(data=data, columns=["A1", "A2", "A3", "A4", "A5"], index=combined_idx_obj)

df1
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
C 3 0.343178 0.729050 0.438572 0.059678 0.398044
D 4 0.737995 0.182492 0.175452 0.531551 0.531828
E 5 0.634401 0.849432 0.724455 0.611024 0.722443

DataFrame with Hierarchical Index for Rows (axis=0)

In this section, we have created a hierarchical index using from_product() method by providing three lists to it. We have then created a pandas dataframe whose index is set as MultiIndex object representing hierarchical index.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

combined_idx_obj[:5]
MultiIndex([(2021, 1, 1),
            (2021, 1, 2),
            (2021, 1, 3),
            (2021, 1, 4),
            (2021, 1, 5)],
           names=['Year', 'Month', 'Day'])
data = np.random.rand(len(combined_idx_obj), 5)

df2 = pd.DataFrame(data=data, index=combined_idx_obj, columns=["C1", "C2", "C3", "C4", "C5"])

df2
C1 C2 C3 C4 C5
Year Month Day
2021 1 1 0.322959 0.361789 0.228263 0.293714 0.630976
2 0.092105 0.433701 0.430863 0.493685 0.425830
3 0.312261 0.426351 0.893389 0.944160 0.501837
4 0.623953 0.115618 0.317285 0.414826 0.866309
5 0.250455 0.483034 0.985560 0.519485 0.612895
... ... ... ... ... ... ...
2 27 0.572147 0.660952 0.298245 0.418627 0.453089
28 0.932351 0.587494 0.948252 0.556035 0.500561
29 0.003532 0.480889 0.927455 0.198366 0.052091
30 0.406779 0.372396 0.857153 0.026611 0.920149
31 0.680903 0.904226 0.607529 0.811953 0.335544

62 rows × 5 columns

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)

In this section, we have created examples where both row and column headings will be represented using MultiIndex object.

First, we have created hierarchical MultiIndex just like our previous example to represent rows of dataframe. Then, we have created another hierarchical MultiIndex object using from_product() method which will be used to represent column labels.

year = [2021,]
months = [1,2]
days = list(range(1,32))

combined_idx_obj = pd.MultiIndex.from_product([year,months, days], names=["Year", "Month","Day"])

columns_idx = pd.MultiIndex.from_product([["Location1","Location2"],
                                          ["Item1", "Item2", "Item3", "Item4", "Item5"]],
                                         names=["Location", "Item"]
                                        )

columns_idx
MultiIndex([('Location1', 'Item1'),
            ('Location1', 'Item2'),
            ('Location1', 'Item3'),
            ('Location1', 'Item4'),
            ('Location1', 'Item5'),
            ('Location2', 'Item1'),
            ('Location2', 'Item2'),
            ('Location2', 'Item3'),
            ('Location2', 'Item4'),
            ('Location2', 'Item5')],
           names=['Location', 'Item'])

Below, we have created a pandas dataframe of random numbers whose index and columns are represented using MultiIndex objects.

data = np.random.randint(1, 1000, size=(len(combined_idx_obj), len(columns_idx)))

df3 = pd.DataFrame(data=data, index=combined_idx_obj, columns=columns_idx)

df3
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444 636 469 921 532 117
2 154 127 519 451 312 72 501 94 523 456
3 249 485 401 584 175 940 192 414 555 235
4 528 119 260 341 50 546 766 744 291 64
5 269 875 539 456 937 810 925 254 656 235
... ... ... ... ... ... ... ... ... ... ... ...
2 27 921 145 123 115 878 931 209 568 284 706
28 546 65 623 896 643 443 824 22 426 388
29 463 100 743 630 780 293 752 553 58 63
30 327 479 901 91 345 529 924 887 923 437
31 794 689 93 316 539 593 637 38 336 905

62 rows × 10 columns

DataFrame with Hierarchical Index for Rows (axis=0) and Columns (axis=1)

In this section, we have created another example where we are using MultiIndex object to represent labels of rows and columns. We have this time used a list of dates as a second-level index for rows.

locations = ["Location1", "Location2"]
rng = pd.date_range(start="2021-1-1", end="2021-2-28", freq="D")



rows_idx = pd.MultiIndex.from_product([locations,rng], names=["Location", "Date"])

columns_idx = pd.MultiIndex.from_product([["Category1","Category2"],
                                          ["Item1", "Item2", "Item3", "Item4", "Item5"]],
                                         names=["Category", "Item"])

columns_idx
MultiIndex([('Category1', 'Item1'),
            ('Category1', 'Item2'),
            ('Category1', 'Item3'),
            ('Category1', 'Item4'),
            ('Category1', 'Item5'),
            ('Category2', 'Item1'),
            ('Category2', 'Item2'),
            ('Category2', 'Item3'),
            ('Category2', 'Item4'),
            ('Category2', 'Item5')],
           names=['Category', 'Item'])
data = np.random.randint(1, 1000, size=(len(rows_idx), len(columns_idx)))

df4 = pd.DataFrame(data=data, index=rows_idx, columns=columns_idx)

df4
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877
2021-01-02 508 511 134 165 126 807 644 945 421 202
2021-01-03 906 113 867 520 535 113 244 839 373 191
2021-01-04 578 294 660 927 670 176 729 10 700 81
2021-01-05 905 875 826 928 88 400 110 268 419 245
... ... ... ... ... ... ... ... ... ... ... ...
Location2 2021-02-24 60 204 313 832 600 763 24 865 658 360
2021-02-25 21 726 979 578 721 488 115 55 703 964
2021-02-26 74 299 910 78 922 779 601 479 159 927
2021-02-27 811 612 506 345 159 87 682 654 562 273
2021-02-28 326 261 851 102 939 332 192 931 783 312

118 rows × 10 columns

Below, we have retrieved single level from MultiIndex object using get_level_values() method and we can notice that it's DatetimeIndex object. Internally, MultiIndex object is composed of more than one Index object.

df4.index.get_level_values(1)
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04',
               '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08',
               '2021-01-09', '2021-01-10',
               ...
               '2021-02-19', '2021-02-20', '2021-02-21', '2021-02-22',
               '2021-02-23', '2021-02-24', '2021-02-25', '2021-02-26',
               '2021-02-27', '2021-02-28'],
              dtype='datetime64[ns]', name='Date', length=118, freq=None)

3. Indexing DataFrame

In this section, we'll explain how we can perform indexing on a dataframe whose index or column labels are represented using MultiIndex objects. We'll explain various ways of indexing multi-level index in this section. We'll be using '.loc' property of pandas dataframe to perform indexing as it accepts actual index values which can be of any type to perform the indexing. We won't be covering indexing using '.iloc' property which accepts integers for indexing. We'll also explain indexing using xs() function.

We can provide labels of 1st levels of MultiIndex object directly to '.loc' property and it'll retrieve all rows/columns that satisfy that 1st level.

Below we have retrieved rows of pandas dataframe whose 1st level value for MultiIndex object is 'A'.

df1.loc["A"]
A1 A2 A3 A4 A5
Index2
1 0.696469 0.286139 0.226851 0.551315 0.719469

Below, we have provided a list of 1st level values to '.loc' property to retrieve rows where these three values are 1st level values of the index.

df1.loc[["A","B","D"]]
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
D 4 0.737995 0.182492 0.175452 0.531551 0.531828

Below, we have provided indexing values for both rows and columns to '.loc' property. The values provided for index is the same as our previous example.

df1.loc[["A","B","D"], ["A1","A3","A5"]]
A1 A3 A5
Index1 Index2
A 1 0.696469 0.226851 0.719469
B 2 0.423106 0.684830 0.392118
D 4 0.737995 0.175452 0.531828

Below, we have created an example to explain how we can give a tuple of values to an index dataframe. Each tuple has two values representing values for both levels of MultiIndex object. This kind of indexing will retrieve rows/columns which exactly match the combinations of values provided.

df1.loc[[("A",1),("B",2),("D",4)]]
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
D 4 0.737995 0.182492 0.175452 0.531551 0.531828

Below we have again used only the 1st level value of MultiIndex object to the index dataframe.

df2.loc[2021]
C1 C2 C3 C4 C5
Month Day
1 1 0.322959 0.361789 0.228263 0.293714 0.630976
2 0.092105 0.433701 0.430863 0.493685 0.425830
3 0.312261 0.426351 0.893389 0.944160 0.501837
4 0.623953 0.115618 0.317285 0.414826 0.866309
5 0.250455 0.483034 0.985560 0.519485 0.612895
... ... ... ... ... ... ...
2 27 0.572147 0.660952 0.298245 0.418627 0.453089
28 0.932351 0.587494 0.948252 0.556035 0.500561
29 0.003532 0.480889 0.927455 0.198366 0.052091
30 0.406779 0.372396 0.857153 0.026611 0.920149
31 0.680903 0.904226 0.607529 0.811953 0.335544

62 rows × 5 columns

Below, we have provided a two-value tuple to index dataframe which has 3-level index.

df2.loc[(2021,1),]
C1 C2 C3 C4 C5
Day
1 0.322959 0.361789 0.228263 0.293714 0.630976
2 0.092105 0.433701 0.430863 0.493685 0.425830
3 0.312261 0.426351 0.893389 0.944160 0.501837
4 0.623953 0.115618 0.317285 0.414826 0.866309
5 0.250455 0.483034 0.985560 0.519485 0.612895
6 0.120629 0.826341 0.603060 0.545068 0.342764
7 0.304121 0.417022 0.681301 0.875457 0.510422
8 0.669314 0.585937 0.624904 0.674689 0.842342
9 0.083195 0.763683 0.243666 0.194223 0.572457
10 0.095713 0.885327 0.627249 0.723416 0.016129
11 0.594432 0.556785 0.158960 0.153071 0.695530
12 0.318766 0.691970 0.554383 0.388951 0.925132
13 0.841670 0.357398 0.043591 0.304768 0.398186
14 0.704959 0.995358 0.355915 0.762548 0.593177
15 0.691702 0.151127 0.398876 0.240856 0.343456
16 0.513128 0.666625 0.105908 0.130895 0.321981
17 0.661564 0.846506 0.553257 0.854452 0.384838
18 0.316788 0.354265 0.171082 0.829113 0.338671
19 0.552370 0.578551 0.521533 0.002688 0.988345
20 0.905342 0.207636 0.292489 0.520010 0.901911
21 0.983631 0.257542 0.564359 0.806969 0.394370
22 0.731073 0.161069 0.600699 0.865864 0.983522
23 0.079366 0.428347 0.204543 0.450636 0.547764
24 0.093327 0.296861 0.927584 0.569004 0.457412
25 0.753526 0.741862 0.048579 0.708697 0.839243
26 0.165938 0.780998 0.286537 0.306470 0.665261
27 0.111392 0.664872 0.887857 0.696311 0.440328
28 0.438214 0.765096 0.565642 0.084904 0.582671
29 0.814844 0.337066 0.927577 0.750717 0.574064
30 0.751644 0.079149 0.859389 0.821504 0.909872
31 0.128631 0.081780 0.138416 0.399379 0.424307

Below we have provided indexing values for all three levels to index the dataframe.

df2.loc[(2021,1, 1),]
C1    0.322959
C2    0.361789
C3    0.228263
C4    0.293714
C5    0.630976
Name: (2021, 1, 1), dtype: float64

In the below cell, we have explained an example where the second level of the index has date-time values. We can provide date-time values as either string or as date-time objects.

df4.loc[("Location1","2021-1-1"),]
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877

Using slice() Function

The slice(start,end[,step]) function is normally used to create range of values. It has the same format as integer indexing on numpy arrays but slice() can also accept any data type as input and is used to represent a range of values for that data type based on values present in an index of dataframe/series.

We can call slice() function with just None value and it will represent all values in a particular level of MultiIndex object.

Below we have filtered our dataframe with a 2-level index. We have provided a tuple of two values where the first value is slice(None) and the second value is a list of three integers. This will inform the dataframe to take all entries where second-level values belong to one of the provided three integers. It'll consider all first-level values which have specified second-level values.

df1.loc[(slice(None), [1,2,3]),]
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
C 3 0.343178 0.729050 0.438572 0.059678 0.398044

In the below cell, we have filtered our data set at row level using the same logic as our previous example. This time, we have provided filters for columns as well.

row_filter = (slice(None), [1,2,3])
column_filter = ["A1","A3","A5"]

df1.loc[row_filter, column_filter]
A1 A3 A5
Index1 Index2
A 1 0.696469 0.226851 0.719469
B 2 0.423106 0.684830 0.392118
C 3 0.343178 0.438572 0.398044

In the below cell, we have filtered rows of our dataframe which has a three-level index. For the first level, we have provided a single value. For the second level, we have provided slice(None) which will consider all values based on the values of other levels. For the third level, we have provided five values. This will consider all second-level values where the first-level value is 2021 and the third-level value is one of the five input integers.

df3.loc[(2021, slice(None), [1,2,3,4,5]),]
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444 636 469 921 532 117
2 154 127 519 451 312 72 501 94 523 456
3 249 485 401 584 175 940 192 414 555 235
4 528 119 260 341 50 546 766 744 291 64
5 269 875 539 456 937 810 925 254 656 235
2 1 480 18 354 30 620 23 48 439 212 494
2 639 506 23 552 576 72 398 380 471 878
3 78 2 378 4 728 243 868 265 528 864
4 160 27 15 324 228 793 15 764 12 71
5 740 408 582 112 169 572 493 287 405 60

Below, we have created another example demonstrating usage of slice() function.

df3.loc[(2021, [2,], slice(None)),]
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 2 1 480 18 354 30 620 23 48 439 212 494
2 639 506 23 552 576 72 398 380 471 878
3 78 2 378 4 728 243 868 265 528 864
4 160 27 15 324 228 793 15 764 12 71
5 740 408 582 112 169 572 493 287 405 60
6 75 976 462 985 270 358 338 270 408 499
7 712 237 537 994 877 108 668 88 18 57
8 866 682 835 140 420 325 83 76 859 273
9 171 92 40 278 704 394 361 646 17 332
10 447 710 491 470 946 80 196 921 33 761
11 1 591 128 145 427 442 147 566 370 339
12 596 769 215 656 664 724 975 192 698 664
13 954 101 960 169 161 224 442 893 696 863
14 783 833 881 396 163 549 240 542 730 11
15 956 800 770 394 124 174 534 841 118 55
16 443 286 693 857 789 850 788 232 464 369
17 167 803 541 366 568 990 372 951 949 176
18 579 951 288 826 334 218 325 28 840 621
19 752 481 740 828 781 468 274 947 50 980
20 299 737 853 302 24 988 5 926 299 821
21 920 615 398 740 420 154 759 781 232 789
22 346 354 552 206 981 811 571 561 175 754
23 147 908 292 378 862 876 33 620 383 343
24 830 937 461 220 220 787 101 543 562 682
25 871 17 732 708 774 316 787 698 215 552
26 380 813 42 838 152 662 53 725 334 152
27 921 145 123 115 878 931 209 568 284 706
28 546 65 623 896 643 443 824 22 426 388
29 463 100 743 630 780 293 752 553 58 63
30 327 479 901 91 345 529 924 887 923 437
31 794 689 93 316 539 593 637 38 336 905

Below, we have created an example, where we are indexing a dataframe whose both rows and columns labels are MultiIndex objects. For filtering rows, we have used the same tuple of three values that we had used in one of our earlier examples. For filtering columns, we have used a tuple of two values where the first value is a single string and the second value is a list of three strings.

df3.loc[(2021, slice(None), [1,2,3,4,5]), (["Location1"], ["Item1","Item2", "Item3"])]
Location Location1
Item Item1 Item2 Item3
Year Month Day
2021 1 1 24 531 61
2 154 127 519
3 249 485 401
4 528 119 260
5 269 875 539
2 1 480 18 354
2 639 506 23
3 78 2 378
4 160 27 15
5 740 408 582

In the below cell, we have tried to reproduce our example from the previous cell but this time instead of providing 5 integers for the last level of filtering rows, we have provided slice(1, 5) function. It'll have the same effect as our previous example.

df3.loc[(2021,slice(None), slice(1,5)), (["Location1"], ["Item1","Item2", "Item3"])]
Location Location1
Item Item1 Item2 Item3
Year Month Day
2021 1 1 24 531 61
2 154 127 519
3 249 485 401
4 528 119 260
5 269 875 539
2 1 480 18 354
2 639 506 23
3 78 2 378
4 160 27 15
5 740 408 582

In the below cell, we have filtered rows of our dataframe where we had datetime index as a part of MultiIndex object.

df4.loc[(slice(None), ["2021-1-1", "2021-1-2", "2021-1-3"]),]
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877
2021-01-02 508 511 134 165 126 807 644 945 421 202
2021-01-03 906 113 867 520 535 113 244 839 373 191
Location2 2021-01-01 417 625 322 117 488 993 386 429 910 771
2021-01-02 218 728 381 524 691 585 4 645 53 581
2021-01-03 588 173 651 860 638 539 502 726 771 533

Range Indexing

In this section, we'll explain how we can specify a range of values using ':' operator for filtering rows/columns of our dataframe.

Below we have filtered our first dataframe by providing value '"A":"C"' for indexing. This informs the '.loc' property that keep rows of dataframe where first level index values ranges from 'A' to 'C' which is three values ['A','B','C'].

df1.loc["A":"C",]
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
C 3 0.343178 0.729050 0.438572 0.059678 0.398044

Below we have created another example where we have provided tuples to filter rows of dataframe. This will keep all entries in the dataframe starting from the first tuple index values till the last tuple index values.

df1.loc[("A",1): ("C",3),]
A1 A2 A3 A4 A5
Index1 Index2
A 1 0.696469 0.286139 0.226851 0.551315 0.719469
B 2 0.423106 0.980764 0.684830 0.480932 0.392118
C 3 0.343178 0.729050 0.438572 0.059678 0.398044

In the below cell, we have created another example demonstrating the usage of range filtering where we have filtered rows of dataframe as well as columns of dataframe using range operator (':').

df2.loc[(2021,1,1):(2021,1,15), "C3":"C5"]
C3 C4 C5
Year Month Day
2021 1 1 0.228263 0.293714 0.630976
2 0.430863 0.493685 0.425830
3 0.893389 0.944160 0.501837
4 0.317285 0.414826 0.866309
5 0.985560 0.519485 0.612895
6 0.603060 0.545068 0.342764
7 0.681301 0.875457 0.510422
8 0.624904 0.674689 0.842342
9 0.243666 0.194223 0.572457
10 0.627249 0.723416 0.016129
11 0.158960 0.153071 0.695530
12 0.554383 0.388951 0.925132
13 0.043591 0.304768 0.398186
14 0.355915 0.762548 0.593177
15 0.398876 0.240856 0.343456

In the below cell, we have created another example demonstrating the usage of the range operator for filtering rows and columns of the dataframe. This time we have used tuples for columns as well.

df3.loc[(2021,1,1):(2021,1,15), ("Location1", "Item4"): ("Location2","Item1")]
Location Location1 Location2
Item Item4 Item5 Item1
Year Month Day
2021 1 1 112 444 636
2 451 312 72
3 584 175 940
4 341 50 546
5 456 937 810
6 805 360 744
7 726 9 666
8 95 890 789
9 692 448 654
10 755 597 703
11 606 560 359
12 535 193 661
13 582 942 7
14 138 721 872
15 184 288 160

In the below cell, we have used range indexing on our dataframe where values of one level are of datetime data type.

df4.loc["Location1", "2021-1-1":"2021-1-5", :]
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877
2021-01-02 508 511 134 165 126 807 644 945 421 202
2021-01-03 906 113 867 520 535 113 244 839 373 191
2021-01-04 578 294 660 927 670 176 729 10 700 81
2021-01-05 905 875 826 928 88 400 110 268 419 245

In the below cell, we have created one more example demonstrating usage of range operator to filter rows of dataframe where one level values are of datetime data type. We have provided datetime values as strings.

df4.loc["Location1":"Location2", "2021-1-1":"2021-1-5", :]
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877
2021-01-02 508 511 134 165 126 807 644 945 421 202
2021-01-03 906 113 867 520 535 113 244 839 373 191
2021-01-04 578 294 660 927 670 176 729 10 700 81
2021-01-05 905 875 826 928 88 400 110 268 419 245
Location2 2021-01-01 417 625 322 117 488 993 386 429 910 771
2021-01-02 218 728 381 524 691 585 4 645 53 581
2021-01-03 588 173 651 860 638 539 502 726 771 533
2021-01-04 651 28 925 593 458 975 625 128 556 413
2021-01-05 580 999 408 621 152 654 632 794 203 259

Using xs() for Indexing

In this section, we'll explain how we can use xs() function available from the pandas dataframe to filter rows/columns of the dataframe.


  • xs(key,axis=0,level=None) - This method takes as input labels of rows/columns along with axis value (0-rows,1-columns) and level names of MultiIndex object and returns dataframe with specified labels present in it.
    • The key parameter takes as input single label or tuple of labels from our multi-level indexes. The rows/columns which have these label values will be kept and all others will be dropped.
    • The axis method accepts either 0 (rows) or 1 (columns) as value. The value of 0 will consider label values provided to key parameter across index of dataframe and value of '1' will consider values provided to key parameter across columns of dataframe.
    • The level accepts a single value or tuple of values representing level names for multi-level index.

The benefit of using this method is that we can provide information about individual levels and all other level labels will be handled internally without we need to explicitly handle them.

Below we have retrieved rows from our dataframe where values for level 'Day' is 1. There are two rows in the dataframe where the value of level 'Day' is 1.

df2.xs(1, level="Day")
C1 C2 C3 C4 C5
Year Month
2021 1 0.322959 0.361789 0.228263 0.293714 0.630976
2 0.562218 0.122244 0.201400 0.811644 0.467988

We can also provide slice() function call to xs() method.

Below we have retrieved rows from our dataframe where values for level 'Day' are in the range [1-3]. There are six values that satisfies this condition because we have 'Day' values [1,2,3] present for month values 1 and 2 both.

df2.xs(slice(1,3), level="Day")
C1 C2 C3 C4 C5
Year Month
2021 1 0.322959 0.361789 0.228263 0.293714 0.630976
1 0.092105 0.433701 0.430863 0.493685 0.425830
1 0.312261 0.426351 0.893389 0.944160 0.501837
2 0.562218 0.122244 0.201400 0.811644 0.467988
2 0.807938 0.007426 0.551593 0.931932 0.582175
2 0.206096 0.717758 0.378986 0.668384 0.029320

In the below cell, we have explained how we can use xs() method with columns of our dataframe.

We have filtered columns of our dataframe where level named 'Location' has values 'Location1'. There are 5 columns that satisfy this condition.

df3.xs("Location1", level="Location", axis=1)
Item Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444
2 154 127 519 451 312
3 249 485 401 584 175
4 528 119 260 341 50
5 269 875 539 456 937
... ... ... ... ... ... ...
2 27 921 145 123 115 878
28 546 65 623 896 643
29 463 100 743 630 780
30 327 479 901 91 345
31 794 689 93 316 539

62 rows × 5 columns

In the below cell, we have filtered columns of our dataframe using xs() function where label values are provided by calling slice() function. We have filtered columns of the dataframe where labels of level 'Item' are one of ['Item1','Item2','Item3].

df3.xs(slice("Item1", "Item3"), level="Item", axis=1)
Location Location1 Location1 Location1 Location2 Location2 Location2
Year Month Day
2021 1 1 24 531 61 636 469 921
2 154 127 519 72 501 94
3 249 485 401 940 192 414
4 528 119 260 546 766 744
5 269 875 539 810 925 254
... ... ... ... ... ... ... ...
2 27 921 145 123 931 209 568
28 546 65 623 443 824 22
29 463 100 743 293 752 553
30 327 479 901 529 924 887
31 794 689 93 593 637 38

62 rows × 6 columns

In the below cell, we have created another example demonstrating usage of xs() function. This time we have explained how we can give labels for more than one level. We have provided values for three levels of our dataframe and only one row satisfies those labels.

df3.xs((2021,1,1), level=("Year", "Month","Day"))
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444 636 469 921 532 117

In the below cell, we have created another example demonstrating how we can provide labels for multiple levels to xs() method.

df4.xs(("Location1","2021-1-1"), level=("Location", "Date"))
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877

In the below cell, we have again created an example demonstrating how to give labels for multiple levels to xs() method but this time we have used slice() call as one of the level's labels.

df3.xs((2021,slice(None),1), level=("Year", "Month","Day"))
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444 636 469 921 532 117
2 1 480 18 354 30 620 23 48 439 212 494

In the next cell, we have created another example demonstrating usage of xs() method. This time we have used two slice() function call. One of the calls refers to all labels at that level and another call refers to the range of values.

df3.xs((2021,slice(None),slice(1,5)), level=("Year", "Month","Day"))
Location Location1 Location2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Year Month Day
2021 1 1 24 531 61 112 444 636 469 921 532 117
2 154 127 519 451 312 72 501 94 523 456
3 249 485 401 584 175 940 192 414 555 235
4 528 119 260 341 50 546 766 744 291 64
5 269 875 539 456 937 810 925 254 656 235
2 1 480 18 354 30 620 23 48 439 212 494
2 639 506 23 552 576 72 398 380 471 878
3 78 2 378 4 728 243 868 265 528 864
4 160 27 15 324 228 793 15 764 12 71
5 740 408 582 112 169 572 493 287 405 60

In our last example, we have explained how we can use xs() function when one of the level values is of datetime data type.

df4.xs(("Location1",slice("2021-1-1", "2021-1-5")), level=("Location", "Date"))
Category Category1 Category2
Item Item1 Item2 Item3 Item4 Item5 Item1 Item2 Item3 Item4 Item5
Location Date
Location1 2021-01-01 800 769 521 306 455 748 933 744 755 877
2021-01-02 508 511 134 165 126 807 644 945 421 202
2021-01-03 906 113 867 520 535 113 244 839 373 191
2021-01-04 578 294 660 927 670 176 729 10 700 81
2021-01-05 905 875 826 928 88 400 110 268 419 245

This ends our small tutorial explaining how we can create, index, and work with multi-level /hierarchical indexing (MultiIndex) with pandas dataframe. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.