Updated On : Dec-14,2021 Time Investment : ~20 mins

Guide to Create Pivot Tables from Pandas DataFrame

Pivot table is a table where we have grouped values of our extensive table based on values of some columns (generally categorical columns) of data. We can then perform some aggregate operations on these aggregate values as well. Pivot tables rearrange data and perform statistics on them which can help us find meaningful insights which won't be otherwise visible hence it has become an integral part of data analysis. Pandas is the most commonly used library to load and work with tabular data in Python. It also provides two methods to create pivot tables from pandas dataframe.

  1. pivot() - This method simply rearranges the dataframe based on the given input index, columns, and values.
  2. pivot_table() - This method rearranges dataframe and can also let us perform statistics on values.

As a part of this tutorial, we'll explain both methods with simple examples to make their usage clear.

Below, we have imported the necessary libraries that we'll use for our tutorial. We have printed the version of pandas as well that we'll be using in our tutorial.

Then, in the next cell, we have created a pandas dataframe which we'll use to explain the usage of pivot() and pivot_table() methods. Dataframe is created using random numbers from numpy. We have also kept two categorical columns in the dataframe to perform various aggregate stats on them.

import pandas as pd

print("Pandas Version : {}".format(pd.__version__))
Pandas Version : 1.3.4
import numpy as np
np.random.seed(1234)

data = np.random.rand(15,5)

df = pd.DataFrame(data=data, columns=list("ABCDE"))

df["Type"] = np.random.choice(["Class1","Class2","Class3"], size=len(df))
df["Category"] = np.random.choice(["Category1","Category2"], size=len(df))
df["Item"] = ["Item%d"%(i+1) for i in range(df.shape[0])]

df
A B C D E Type Category Item
0 0.191519 0.622109 0.437728 0.785359 0.779976 Class3 Category1 Item1
1 0.272593 0.276464 0.801872 0.958139 0.875933 Class2 Category1 Item2
2 0.357817 0.500995 0.683463 0.712702 0.370251 Class2 Category1 Item3
3 0.561196 0.503083 0.013768 0.772827 0.882641 Class3 Category1 Item4
4 0.364886 0.615396 0.075381 0.368824 0.933140 Class3 Category2 Item5
5 0.651378 0.397203 0.788730 0.316836 0.568099 Class1 Category1 Item6
6 0.869127 0.436173 0.802148 0.143767 0.704261 Class2 Category1 Item7
7 0.704581 0.218792 0.924868 0.442141 0.909316 Class2 Category2 Item8
8 0.059809 0.184287 0.047355 0.674881 0.594625 Class2 Category2 Item9
9 0.533310 0.043324 0.561433 0.329668 0.502967 Class3 Category1 Item10
10 0.111894 0.607194 0.565945 0.006764 0.617442 Class1 Category1 Item11
11 0.912123 0.790524 0.992081 0.958802 0.791964 Class1 Category1 Item12
12 0.285251 0.624917 0.478094 0.195675 0.382317 Class1 Category2 Item13
13 0.053874 0.451648 0.982005 0.123943 0.119381 Class1 Category1 Item14
14 0.738523 0.587304 0.471633 0.107127 0.229219 Class3 Category1 Item15

1. pivot()

In this section, we'll explain various examples explaining how to use pivot() method to pivot values of the dataframe.


  • pivot(data,index,columns,values) - This method takes dataframe and columns names as input to create pivot table from it. The index and values are optional arguments.
    • The columns parameter accepts one or more column names as input which will be columns of our pivot table. These are generally categorical columns of data.
    • The index parameter accepts one or more columns names as input that will be a new index of our pivot table. These columns can be categorical or not. If we don't provide index parameter then a simple number-based index will be included in the resulting pivot table.
    • The values parameter accepts one or more column names as input whose values will be included in the pivot table. If we don't provide values parameter then all columns except provided to columns and values parameter will be included.

Below we have pivoted our dataframe by asking it to use a column named 'Category' for columns parameter of pivot() method. The 'Category' column has two unique values ('Category1' and 'Category2'). This will create column names based on combination of all remaining columns (['A','B','C','D','E','Type']) and unique values of 'Category' column. The index of the resulting dataframe will be simple integer range-based indexing. The values of the dataframe will be based on the column name and unique value of 'Category' column. If there is a value for a particular column and a unique value from 'Category' column then the resulting dataframe will have value else there will be NaN.

pd.pivot(df, columns="Category")
A B C D E Type Item
Category Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2
0 0.191519 NaN 0.622109 NaN 0.437728 NaN 0.785359 NaN 0.779976 NaN Class3 NaN Item1 NaN
1 0.272593 NaN 0.276464 NaN 0.801872 NaN 0.958139 NaN 0.875933 NaN Class2 NaN Item2 NaN
2 0.357817 NaN 0.500995 NaN 0.683463 NaN 0.712702 NaN 0.370251 NaN Class2 NaN Item3 NaN
3 0.561196 NaN 0.503083 NaN 0.013768 NaN 0.772827 NaN 0.882641 NaN Class3 NaN Item4 NaN
4 NaN 0.364886 NaN 0.615396 NaN 0.075381 NaN 0.368824 NaN 0.933140 NaN Class3 NaN Item5
5 0.651378 NaN 0.397203 NaN 0.788730 NaN 0.316836 NaN 0.568099 NaN Class1 NaN Item6 NaN
6 0.869127 NaN 0.436173 NaN 0.802148 NaN 0.143767 NaN 0.704261 NaN Class2 NaN Item7 NaN
7 NaN 0.704581 NaN 0.218792 NaN 0.924868 NaN 0.442141 NaN 0.909316 NaN Class2 NaN Item8
8 NaN 0.059809 NaN 0.184287 NaN 0.047355 NaN 0.674881 NaN 0.594625 NaN Class2 NaN Item9
9 0.533310 NaN 0.043324 NaN 0.561433 NaN 0.329668 NaN 0.502967 NaN Class3 NaN Item10 NaN
10 0.111894 NaN 0.607194 NaN 0.565945 NaN 0.006764 NaN 0.617442 NaN Class1 NaN Item11 NaN
11 0.912123 NaN 0.790524 NaN 0.992081 NaN 0.958802 NaN 0.791964 NaN Class1 NaN Item12 NaN
12 NaN 0.285251 NaN 0.624917 NaN 0.478094 NaN 0.195675 NaN 0.382317 NaN Class1 NaN Item13
13 0.053874 NaN 0.451648 NaN 0.982005 NaN 0.123943 NaN 0.119381 NaN Class1 NaN Item14 NaN
14 0.738523 NaN 0.587304 NaN 0.471633 NaN 0.107127 NaN 0.229219 NaN Class3 NaN Item15 NaN

Below we have created another example demonstrating usage of pivot() method where we have again used column 'Category' for columns parameter. This time we have informed the method to use column named 'Item' for index parameter. We can notice that the resulting dataframe has the same column names as the previous dataframe except that 'Item' column has moved to the index of the dataframe.

pd.pivot(df, index="Item", columns="Category")
A B C D E Type
Category Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2
Item
Item1 0.191519 NaN 0.622109 NaN 0.437728 NaN 0.785359 NaN 0.779976 NaN Class3 NaN
Item10 0.533310 NaN 0.043324 NaN 0.561433 NaN 0.329668 NaN 0.502967 NaN Class3 NaN
Item11 0.111894 NaN 0.607194 NaN 0.565945 NaN 0.006764 NaN 0.617442 NaN Class1 NaN
Item12 0.912123 NaN 0.790524 NaN 0.992081 NaN 0.958802 NaN 0.791964 NaN Class1 NaN
Item13 NaN 0.285251 NaN 0.624917 NaN 0.478094 NaN 0.195675 NaN 0.382317 NaN Class1
Item14 0.053874 NaN 0.451648 NaN 0.982005 NaN 0.123943 NaN 0.119381 NaN Class1 NaN
Item15 0.738523 NaN 0.587304 NaN 0.471633 NaN 0.107127 NaN 0.229219 NaN Class3 NaN
Item2 0.272593 NaN 0.276464 NaN 0.801872 NaN 0.958139 NaN 0.875933 NaN Class2 NaN
Item3 0.357817 NaN 0.500995 NaN 0.683463 NaN 0.712702 NaN 0.370251 NaN Class2 NaN
Item4 0.561196 NaN 0.503083 NaN 0.013768 NaN 0.772827 NaN 0.882641 NaN Class3 NaN
Item5 NaN 0.364886 NaN 0.615396 NaN 0.075381 NaN 0.368824 NaN 0.933140 NaN Class3
Item6 0.651378 NaN 0.397203 NaN 0.788730 NaN 0.316836 NaN 0.568099 NaN Class1 NaN
Item7 0.869127 NaN 0.436173 NaN 0.802148 NaN 0.143767 NaN 0.704261 NaN Class2 NaN
Item8 NaN 0.704581 NaN 0.218792 NaN 0.924868 NaN 0.442141 NaN 0.909316 NaN Class2
Item9 NaN 0.059809 NaN 0.184287 NaN 0.047355 NaN 0.674881 NaN 0.594625 NaN Class2

In the below example, we have provided all three parameters index, columns and values to pivot() method. The values of parameters index and columns are the same as the previous cell. The values parameter is asked to keep only two columns (['A','B']). The resulting dataframe has the same values as the previous dataframe but only columns 'A' and 'B' are kept.

pd.pivot(df, index="Item", columns="Category", values=["A", "B"])
A B
Category Category1 Category2 Category1 Category2
Item
Item1 0.191519 NaN 0.622109 NaN
Item10 0.533310 NaN 0.043324 NaN
Item11 0.111894 NaN 0.607194 NaN
Item12 0.912123 NaN 0.790524 NaN
Item13 NaN 0.285251 NaN 0.624917
Item14 0.053874 NaN 0.451648 NaN
Item15 0.738523 NaN 0.587304 NaN
Item2 0.272593 NaN 0.276464 NaN
Item3 0.357817 NaN 0.500995 NaN
Item4 0.561196 NaN 0.503083 NaN
Item5 NaN 0.364886 NaN 0.615396
Item6 0.651378 NaN 0.397203 NaN
Item7 0.869127 NaN 0.436173 NaN
Item8 NaN 0.704581 NaN 0.218792
Item9 NaN 0.059809 NaN 0.184287

In the below cell, we have created pivot table by providing columns and values parameter to pivot() method. The result is exactly the same as our previous cell with the only difference that the index in this example is a range of integers.

pd.pivot(df, columns="Category", values=["A", "B"])
A B
Category Category1 Category2 Category1 Category2
0 0.191519 NaN 0.622109 NaN
1 0.272593 NaN 0.276464 NaN
2 0.357817 NaN 0.500995 NaN
3 0.561196 NaN 0.503083 NaN
4 NaN 0.364886 NaN 0.615396
5 0.651378 NaN 0.397203 NaN
6 0.869127 NaN 0.436173 NaN
7 NaN 0.704581 NaN 0.218792
8 NaN 0.059809 NaN 0.184287
9 0.533310 NaN 0.043324 NaN
10 0.111894 NaN 0.607194 NaN
11 0.912123 NaN 0.790524 NaN
12 NaN 0.285251 NaN 0.624917
13 0.053874 NaN 0.451648 NaN
14 0.738523 NaN 0.587304 NaN

In the below cell, we have created an example where we have provided more than one column name to index parameter. We have asked to use columns 'Item' and 'Type' as indexes of the resulting dataframe. The resulting dataframe has a combination of values from columns 'Item' and 'Type'.

pd.pivot(df, index=["Item", "Type"], columns="Category", values=["A", "B"])
A B
Category Category1 Category2 Category1 Category2
Item Type
Item1 Class3 0.191519 NaN 0.622109 NaN
Item10 Class3 0.533310 NaN 0.043324 NaN
Item11 Class1 0.111894 NaN 0.607194 NaN
Item12 Class1 0.912123 NaN 0.790524 NaN
Item13 Class1 NaN 0.285251 NaN 0.624917
Item14 Class1 0.053874 NaN 0.451648 NaN
Item15 Class3 0.738523 NaN 0.587304 NaN
Item2 Class2 0.272593 NaN 0.276464 NaN
Item3 Class2 0.357817 NaN 0.500995 NaN
Item4 Class3 0.561196 NaN 0.503083 NaN
Item5 Class3 NaN 0.364886 NaN 0.615396
Item6 Class1 0.651378 NaN 0.397203 NaN
Item7 Class2 0.869127 NaN 0.436173 NaN
Item8 Class2 NaN 0.704581 NaN 0.218792
Item9 Class2 NaN 0.059809 NaN 0.184287

In the below cell, we have created an example where we have provided more than one column name to columns parameter of pivot() method. The resulting dataframe will have values for columns 'A' and 'B' for each combination of unique values from columns 'Category' and 'Type'.

pvt_tbl = pd.pivot(df, index="Item", columns=["Category","Type"], values=["A", "B"])

pvt_tbl
A B
Category Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2
Type Class3 Class2 Class3 Class1 Class2 Class1 Class3 Class2 Class3 Class1 Class2 Class1
Item
Item1 0.191519 NaN NaN NaN NaN NaN 0.622109 NaN NaN NaN NaN NaN
Item10 0.533310 NaN NaN NaN NaN NaN 0.043324 NaN NaN NaN NaN NaN
Item11 NaN NaN NaN 0.111894 NaN NaN NaN NaN NaN 0.607194 NaN NaN
Item12 NaN NaN NaN 0.912123 NaN NaN NaN NaN NaN 0.790524 NaN NaN
Item13 NaN NaN NaN NaN NaN 0.285251 NaN NaN NaN NaN NaN 0.624917
Item14 NaN NaN NaN 0.053874 NaN NaN NaN NaN NaN 0.451648 NaN NaN
Item15 0.738523 NaN NaN NaN NaN NaN 0.587304 NaN NaN NaN NaN NaN
Item2 NaN 0.272593 NaN NaN NaN NaN NaN 0.276464 NaN NaN NaN NaN
Item3 NaN 0.357817 NaN NaN NaN NaN NaN 0.500995 NaN NaN NaN NaN
Item4 0.561196 NaN NaN NaN NaN NaN 0.503083 NaN NaN NaN NaN NaN
Item5 NaN NaN 0.364886 NaN NaN NaN NaN NaN 0.615396 NaN NaN NaN
Item6 NaN NaN NaN 0.651378 NaN NaN NaN NaN NaN 0.397203 NaN NaN
Item7 NaN 0.869127 NaN NaN NaN NaN NaN 0.436173 NaN NaN NaN NaN
Item8 NaN NaN NaN NaN 0.704581 NaN NaN NaN NaN NaN 0.218792 NaN
Item9 NaN NaN NaN NaN 0.059809 NaN NaN NaN NaN NaN 0.184287 NaN
pvt_tbl.sort_values(by=["Category","Type"], axis=1)
A B A B A B A B A B A B
Category Category1 Category1 Category1 Category1 Category1 Category1 Category2 Category2 Category2 Category2 Category2 Category2
Type Class1 Class1 Class2 Class2 Class3 Class3 Class1 Class1 Class2 Class2 Class3 Class3
Item
Item1 NaN NaN NaN NaN 0.191519 0.622109 NaN NaN NaN NaN NaN NaN
Item10 NaN NaN NaN NaN 0.533310 0.043324 NaN NaN NaN NaN NaN NaN
Item11 0.111894 0.607194 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Item12 0.912123 0.790524 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Item13 NaN NaN NaN NaN NaN NaN 0.285251 0.624917 NaN NaN NaN NaN
Item14 0.053874 0.451648 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Item15 NaN NaN NaN NaN 0.738523 0.587304 NaN NaN NaN NaN NaN NaN
Item2 NaN NaN 0.272593 0.276464 NaN NaN NaN NaN NaN NaN NaN NaN
Item3 NaN NaN 0.357817 0.500995 NaN NaN NaN NaN NaN NaN NaN NaN
Item4 NaN NaN NaN NaN 0.561196 0.503083 NaN NaN NaN NaN NaN NaN
Item5 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.364886 0.615396
Item6 0.651378 0.397203 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Item7 NaN NaN 0.869127 0.436173 NaN NaN NaN NaN NaN NaN NaN NaN
Item8 NaN NaN NaN NaN NaN NaN NaN NaN 0.704581 0.218792 NaN NaN
Item9 NaN NaN NaN NaN NaN NaN NaN NaN 0.059809 0.184287 NaN NaN

2. pivot_table()

In this section, we'll explain how we can use pivot_table() method with simple examples.


  • pivot_table(data,values,index,columns,aggfunc='mean',fill_value=None,dropna=True,margins=False,margins_name='All',sort=True) - This method takes data frame and index column as input and returns pivoted table. It needs column name at which values will be grouped and index will be created. All other parameters are optional. By default, it applies mean aggregate function on grouped values.
    • The values parameter accepts column names whose values will be grouped and kept in the pivot table after performing aggregate statistics. If we don't provide values then all columns except provided to index and columns parameter will be used.
    • The index parameter accepts columns names at which values of the main dataframe will be grouped and later one some statistics will be performed on these groups of values. We generally provide categorical columns to this parameter.
    • The columns dataframe includes columns whose value will become columns of our pivot table. The values provided to values parameter will be grouped according to columns provided to index and columns parameter. This is an optional parameter.
    • The aggfunc accepts single callable, list of callable or dictionary as input. We can provide a numpy stat function or our designed stats function. It should work on a list of values. The default is 'mean'.
      • If we provide a single callable then it'll be applied to all selected columns of data.
      • If we provide a list of callable then it'll apply all of them one by one to selected columns and the resulting pivot table will have column names according to aggregate function and selected columns.
      • The dictionary has a key with column name and value as single or list of callable. It applies aggregate function specified in value to the column specified in the key of the dictionary
    • The fill_value parameter accepts value to fill for NaN entries.
    • The dropna parameter drops rows where all entries are NaN.
    • The margins parameter if set to True will include the addition of all rows and columns at the end of the dataframe.
    • The margins_name parameter accepts string specifying the name of totals (column name and index label) that will be created when margins is set to True.

Below we have created a pivot table using pivot_table() method asking it to use a column named 'Item' as an index of the resulting dataframe. It'll treat all other columns of the dataframe as columns of the resulting dataframe. The Pviot_table() method only keeps columns that has float/integer values in resulting dataframe.

pd.pivot_table(df, index="Item")
A B C D E
Item
Item1 0.191519 0.622109 0.437728 0.785359 0.779976
Item10 0.533310 0.043324 0.561433 0.329668 0.502967
Item11 0.111894 0.607194 0.565945 0.006764 0.617442
Item12 0.912123 0.790524 0.992081 0.958802 0.791964
Item13 0.285251 0.624917 0.478094 0.195675 0.382317
Item14 0.053874 0.451648 0.982005 0.123943 0.119381
Item15 0.738523 0.587304 0.471633 0.107127 0.229219
Item2 0.272593 0.276464 0.801872 0.958139 0.875933
Item3 0.357817 0.500995 0.683463 0.712702 0.370251
Item4 0.561196 0.503083 0.013768 0.772827 0.882641
Item5 0.364886 0.615396 0.075381 0.368824 0.933140
Item6 0.651378 0.397203 0.788730 0.316836 0.568099
Item7 0.869127 0.436173 0.802148 0.143767 0.704261
Item8 0.704581 0.218792 0.924868 0.442141 0.909316
Item9 0.059809 0.184287 0.047355 0.674881 0.594625

Below we have called pviot_table() method with columns parameter set to column 'Type'. The resulting dataframe will have unique values of column 'Type' as column names and all other columns of the dataframe will be index. The mean aggregate function will be applied to values that are from a particular column and the unique value of 'Type' column. So if we take columns 'A' from the dataframe, then the values of column 'A' that have 'Type' column value 'Class1' will be grouped and the mean function applied to it. Same logic will follow for values that has 'Class2' and 'Class3' as values of column 'Type'.

pd.pivot_table(df, columns="Type")
Type Class1 Class2 Class3
A 0.402904 0.452786 0.477887
B 0.574297 0.323342 0.474243
C 0.761371 0.651941 0.311989
D 0.320404 0.586326 0.472761
E 0.495841 0.690877 0.665588

In the below cell, we have created an example of pivot_table() where we have provided values for all three main parameters index, columns, and values. We have asked to use values of column 'Category' as an index of dataframe, values of column 'Type' as columns of resulting dataframe, and values from column 'A' will be values of the resulting dataframe after the aggregate function is applied to it.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean')
A
Type Class1 Class2 Class3
Category
Category1 0.432317 0.499846 0.506137
Category2 0.285251 0.382195 0.364886

In the below cell, we have provided two columns ('Category' and 'Type') to be used as index of the resulting dataframe. The values of column 'Item' will be columns of the resulting dataframe and values of column 'A' will be values of our resulting dataframe after applying an aggregate function. The index of the resulting dataframe will be created by taking all combinations of different values from columns 'Category' and 'Type'.

pd.pivot_table(df,
               index=["Category", "Type"],
               columns="Item",
               values=["A"],
               aggfunc='mean')
A
Item Item1 Item10 Item11 Item12 Item13 Item14 Item15 Item2 Item3 Item4 Item5 Item6 Item7 Item8 Item9
Category Type
Category1 Class1 NaN NaN 0.111894 0.912123 NaN 0.053874 NaN NaN NaN NaN NaN 0.651378 NaN NaN NaN
Class2 NaN NaN NaN NaN NaN NaN NaN 0.272593 0.357817 NaN NaN NaN 0.869127 NaN NaN
Class3 0.191519 0.53331 NaN NaN NaN NaN 0.738523 NaN NaN 0.561196 NaN NaN NaN NaN NaN
Category2 Class1 NaN NaN NaN NaN 0.285251 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Class2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.704581 0.059809
Class3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.364886 NaN NaN NaN NaN

In the below cell, we have created an example where we have given two columns ('Category' and 'Type') to be used as columns of the resulting dataframe to pivot_table() method. The 'Item' column values will become an index of our dataframe and values of column 'A' after applying aggregate function will become values of the resulting dataframe. The aggregate function will be applied to values of column 'A' for each combination of values from column 'Item', 'Category', and 'Type'.

pd.pivot_table(df,
               index="Item",
               columns=["Category", "Type"],
               values=["A"],
               aggfunc='mean')
A
Category Category1 Category2
Type Class1 Class2 Class3 Class1 Class2 Class3
Item
Item1 NaN NaN 0.191519 NaN NaN NaN
Item10 NaN NaN 0.533310 NaN NaN NaN
Item11 0.111894 NaN NaN NaN NaN NaN
Item12 0.912123 NaN NaN NaN NaN NaN
Item13 NaN NaN NaN 0.285251 NaN NaN
Item14 0.053874 NaN NaN NaN NaN NaN
Item15 NaN NaN 0.738523 NaN NaN NaN
Item2 NaN 0.272593 NaN NaN NaN NaN
Item3 NaN 0.357817 NaN NaN NaN NaN
Item4 NaN NaN 0.561196 NaN NaN NaN
Item5 NaN NaN NaN NaN NaN 0.364886
Item6 0.651378 NaN NaN NaN NaN NaN
Item7 NaN 0.869127 NaN NaN NaN NaN
Item8 NaN NaN NaN NaN 0.704581 NaN
Item9 NaN NaN NaN NaN 0.059809 NaN

In the below cell, we have created an example where we have given more than one aggregate function to pivot_table(). We can notice that the resulting dataframe has an entry for each function.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc=[np.mean, np.std, np.var])
mean std var
A A A
Type Class1 Class2 Class3 Class1 Class2 Class3 Class1 Class2 Class3
Category
Category1 0.432317 0.499846 0.506137 0.417968 0.322634 0.228588 0.174697 0.104093 0.052252
Category2 0.285251 0.382195 0.364886 NaN 0.455923 NaN NaN 0.207866 NaN

In the below cell, we have created an example where we have given dictionary to aggfunc parameter of pivot_table() function. We have asked it to use 'mean' function on column 'A' values and 'std' function on column 'B' values.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": np.std})
A B
Type Class1 Class2 Class3 Class1 Class2 Class3
Category
Category1 0.432317 0.499846 0.506137 0.176636 0.115559 0.268445
Category2 0.285251 0.382195 0.364886 NaN 0.024399 NaN

In the below cell, we have created another example demonstrating how we can provide a dictionary of aggregate functions to aggfunc parameter of pivot_table(). This time we have provided two functions to be used on values of column 'B'.

pd.pivot_table(df, index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]})
A B
mean mean std
Type Class1 Class2 Class3 Class1 Class2 Class3 Class1 Class2 Class3
Category
Category1 0.432317 0.499846 0.506137 0.561642 0.404544 0.438955 0.176636 0.115559 0.268445
Category2 0.285251 0.382195 0.364886 0.624917 0.201540 0.615396 NaN 0.024399 NaN

In the below cell, we have recreated an example from the previous cell and provided value for parameter fill_value so that all NaNs will be filled with this value.

pd.pivot_table(df, index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]},
               fill_value=0.000001
              )
A B
mean mean std
Type Class1 Class2 Class3 Class1 Class2 Class3 Class1 Class2 Class3
Category
Category1 0.432317 0.499846 0.506137 0.561642 0.404544 0.438955 0.176636 0.115559 0.268445
Category2 0.285251 0.382195 0.364886 0.624917 0.201540 0.615396 0.000001 0.024399 0.000001

In the below cell, we have recreated an example from previous cells with the only difference that we have provided dropna parameter this time. We have asked the resulting dataframe to keep entries where all values are NaNs.

pd.pivot_table(df, index="Type",
               columns="Category",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]},
               dropna=False
              )
A B
mean std mean std
Category Category1 Category2 Category1 Category2 Category1 Category2 Category1 Category2
Type
Class1 0.432317 0.285251 NaN NaN 0.561642 0.624917 0.176636 NaN
Class2 0.499846 0.382195 NaN NaN 0.404544 0.201540 0.115559 0.024399
Class3 0.506137 0.364886 NaN NaN 0.438955 0.615396 0.268445 NaN

In the below cell, we have created another example demonstrating the usage of pivot_table(). This time we have included the usage of parameters margins and margins_name.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean',
               margins=True,
               margins_name="Column Types Average"
              )
A
Type Class1 Class2 Class3 Column Types Average
Category
Category1 0.432317 0.499846 0.506137 0.477578
Category2 0.285251 0.382195 0.364886 0.353632
Column Types Average 0.402904 0.452786 0.477887 0.444526
print("Category1 Avg - {}\nCategory2 Avg - {}".format(df[df.Category=="Category1"]['A'].mean(),
                                                      df[df.Category=="Category2"]['A'].mean()))

print("\nClass1 Avg - {}\nClass2 Avg - {}\nClass3 Avg - {}".format(df[df.Type=="Class1"]['A'].mean(),
                                                                   df[df.Type=="Class2"]['A'].mean(),
                                                                   df[df.Type=="Class3"]['A'].mean()))
Category1 Avg - 0.4775777411607928
Category2 Avg - 0.35363186872382657

Class1 Avg - 0.40290399848097636
Class2 Avg - 0.4527855591542317
Class3 Avg - 0.4778869678975973

In the below cell, we have created the last example demonstrating function pivot_table(). We have asked the function to keep entries unsorted by providing sort parameter as False.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean',
               margins=True,
               margins_name="Column Types Average",
               sort=False
              )
A
Type Class1 Class2 Class3 Column Types Average
Category
Category1 0.432317 0.499846 0.506137 0.477578
Category2 0.285251 0.382195 0.364886 0.353632
Column Types Average 0.402904 0.452786 0.477887 0.444526

This ends our small tutorial explaining how we can use pandas functions pivot() and pivot_table() to create pivot tables from pandas dataframe. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.