Updated On : Dec-14,2021 Time Investment : ~20 mins

Guide to Create Pivot Tables from Pandas DataFrame¶

Pivot table is a table where we have grouped values of our extensive table based on values of some columns (generally categorical columns) of data. We can then perform some aggregate operations on these aggregate values as well. Pivot tables rearrange data and perform statistics on them which can help us find meaningful insights which won't be otherwise visible hence it has become an integral part of data analysis. Pandas is the most commonly used library to load and work with tabular data in Python. It also provides two methods to create pivot tables from pandas dataframe.

pivot() - This method simply rearranges the dataframe based on the given input index, columns, and values.
pivot_table() - This method rearranges dataframe and can also let us perform statistics on values.

As a part of this tutorial, we'll explain both methods with simple examples to make their usage clear.

Below, we have imported the necessary libraries that we'll use for our tutorial. We have printed the version of pandas as well that we'll be using in our tutorial.

Then, in the next cell, we have created a pandas dataframe which we'll use to explain the usage of pivot() and pivot_table() methods. Dataframe is created using random numbers from numpy. We have also kept two categorical columns in the dataframe to perform various aggregate stats on them.

import pandas as pd

print("Pandas Version : {}".format(pd.__version__))

Pandas Version : 1.3.4

import numpy as np

np.random.seed(1234)

data = np.random.rand(15,5)

df = pd.DataFrame(data=data, columns=list("ABCDE"))

df["Type"] = np.random.choice(["Class1","Class2","Class3"], size=len(df))
df["Category"] = np.random.choice(["Category1","Category2"], size=len(df))
df["Item"] = ["Item%d"%(i+1) for i in range(df.shape[0])]

df

	A	B	C	D	E	Type	Category	Item
0	0.191519	0.622109	0.437728	0.785359	0.779976	Class3	Category1	Item1
1	0.272593	0.276464	0.801872	0.958139	0.875933	Class2	Category1	Item2
2	0.357817	0.500995	0.683463	0.712702	0.370251	Class2	Category1	Item3
3	0.561196	0.503083	0.013768	0.772827	0.882641	Class3	Category1	Item4
4	0.364886	0.615396	0.075381	0.368824	0.933140	Class3	Category2	Item5
5	0.651378	0.397203	0.788730	0.316836	0.568099	Class1	Category1	Item6
6	0.869127	0.436173	0.802148	0.143767	0.704261	Class2	Category1	Item7
7	0.704581	0.218792	0.924868	0.442141	0.909316	Class2	Category2	Item8
8	0.059809	0.184287	0.047355	0.674881	0.594625	Class2	Category2	Item9
9	0.533310	0.043324	0.561433	0.329668	0.502967	Class3	Category1	Item10
10	0.111894	0.607194	0.565945	0.006764	0.617442	Class1	Category1	Item11
11	0.912123	0.790524	0.992081	0.958802	0.791964	Class1	Category1	Item12
12	0.285251	0.624917	0.478094	0.195675	0.382317	Class1	Category2	Item13
13	0.053874	0.451648	0.982005	0.123943	0.119381	Class1	Category1	Item14
14	0.738523	0.587304	0.471633	0.107127	0.229219	Class3	Category1	Item15

1. pivot()¶

In this section, we'll explain various examples explaining how to use pivot() method to pivot values of the dataframe.

pivot(data,index,columns,values) - This method takes dataframe and columns names as input to create pivot table from it. The index and values are optional arguments.
- The columns parameter accepts one or more column names as input which will be columns of our pivot table. These are generally categorical columns of data.
- The index parameter accepts one or more columns names as input that will be a new index of our pivot table. These columns can be categorical or not. If we don't provide index parameter then a simple number-based index will be included in the resulting pivot table.
- The values parameter accepts one or more column names as input whose values will be included in the pivot table. If we don't provide values parameter then all columns except provided to columns and values parameter will be included.

Below we have pivoted our dataframe by asking it to use a column named 'Category' for columns parameter of pivot() method. The 'Category' column has two unique values ('Category1' and 'Category2'). This will create column names based on combination of all remaining columns (['A','B','C','D','E','Type']) and unique values of 'Category' column. The index of the resulting dataframe will be simple integer range-based indexing. The values of the dataframe will be based on the column name and unique value of 'Category' column. If there is a value for a particular column and a unique value from 'Category' column then the resulting dataframe will have value else there will be NaN.

pd.pivot(df, columns="Category")

	A		B		C		D		E		Type		Item
Category	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2
0	0.191519	NaN	0.622109	NaN	0.437728	NaN	0.785359	NaN	0.779976	NaN	Class3	NaN	Item1	NaN
1	0.272593	NaN	0.276464	NaN	0.801872	NaN	0.958139	NaN	0.875933	NaN	Class2	NaN	Item2	NaN
2	0.357817	NaN	0.500995	NaN	0.683463	NaN	0.712702	NaN	0.370251	NaN	Class2	NaN	Item3	NaN
3	0.561196	NaN	0.503083	NaN	0.013768	NaN	0.772827	NaN	0.882641	NaN	Class3	NaN	Item4	NaN
4	NaN	0.364886	NaN	0.615396	NaN	0.075381	NaN	0.368824	NaN	0.933140	NaN	Class3	NaN	Item5
5	0.651378	NaN	0.397203	NaN	0.788730	NaN	0.316836	NaN	0.568099	NaN	Class1	NaN	Item6	NaN
6	0.869127	NaN	0.436173	NaN	0.802148	NaN	0.143767	NaN	0.704261	NaN	Class2	NaN	Item7	NaN
7	NaN	0.704581	NaN	0.218792	NaN	0.924868	NaN	0.442141	NaN	0.909316	NaN	Class2	NaN	Item8
8	NaN	0.059809	NaN	0.184287	NaN	0.047355	NaN	0.674881	NaN	0.594625	NaN	Class2	NaN	Item9
9	0.533310	NaN	0.043324	NaN	0.561433	NaN	0.329668	NaN	0.502967	NaN	Class3	NaN	Item10	NaN
10	0.111894	NaN	0.607194	NaN	0.565945	NaN	0.006764	NaN	0.617442	NaN	Class1	NaN	Item11	NaN
11	0.912123	NaN	0.790524	NaN	0.992081	NaN	0.958802	NaN	0.791964	NaN	Class1	NaN	Item12	NaN
12	NaN	0.285251	NaN	0.624917	NaN	0.478094	NaN	0.195675	NaN	0.382317	NaN	Class1	NaN	Item13
13	0.053874	NaN	0.451648	NaN	0.982005	NaN	0.123943	NaN	0.119381	NaN	Class1	NaN	Item14	NaN
14	0.738523	NaN	0.587304	NaN	0.471633	NaN	0.107127	NaN	0.229219	NaN	Class3	NaN	Item15	NaN

Below we have created another example demonstrating usage of pivot() method where we have again used column 'Category' for columns parameter. This time we have informed the method to use column named 'Item' for index parameter. We can notice that the resulting dataframe has the same column names as the previous dataframe except that 'Item' column has moved to the index of the dataframe.

pd.pivot(df, index="Item", columns="Category")

	A		B		C		D		E		Type
Category	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2
Item
Item1	0.191519	NaN	0.622109	NaN	0.437728	NaN	0.785359	NaN	0.779976	NaN	Class3	NaN
Item10	0.533310	NaN	0.043324	NaN	0.561433	NaN	0.329668	NaN	0.502967	NaN	Class3	NaN
Item11	0.111894	NaN	0.607194	NaN	0.565945	NaN	0.006764	NaN	0.617442	NaN	Class1	NaN
Item12	0.912123	NaN	0.790524	NaN	0.992081	NaN	0.958802	NaN	0.791964	NaN	Class1	NaN
Item13	NaN	0.285251	NaN	0.624917	NaN	0.478094	NaN	0.195675	NaN	0.382317	NaN	Class1
Item14	0.053874	NaN	0.451648	NaN	0.982005	NaN	0.123943	NaN	0.119381	NaN	Class1	NaN
Item15	0.738523	NaN	0.587304	NaN	0.471633	NaN	0.107127	NaN	0.229219	NaN	Class3	NaN
Item2	0.272593	NaN	0.276464	NaN	0.801872	NaN	0.958139	NaN	0.875933	NaN	Class2	NaN
Item3	0.357817	NaN	0.500995	NaN	0.683463	NaN	0.712702	NaN	0.370251	NaN	Class2	NaN
Item4	0.561196	NaN	0.503083	NaN	0.013768	NaN	0.772827	NaN	0.882641	NaN	Class3	NaN
Item5	NaN	0.364886	NaN	0.615396	NaN	0.075381	NaN	0.368824	NaN	0.933140	NaN	Class3
Item6	0.651378	NaN	0.397203	NaN	0.788730	NaN	0.316836	NaN	0.568099	NaN	Class1	NaN
Item7	0.869127	NaN	0.436173	NaN	0.802148	NaN	0.143767	NaN	0.704261	NaN	Class2	NaN
Item8	NaN	0.704581	NaN	0.218792	NaN	0.924868	NaN	0.442141	NaN	0.909316	NaN	Class2
Item9	NaN	0.059809	NaN	0.184287	NaN	0.047355	NaN	0.674881	NaN	0.594625	NaN	Class2

In the below example, we have provided all three parameters index, columns and values to pivot() method. The values of parameters index and columns are the same as the previous cell. The values parameter is asked to keep only two columns (['A','B']). The resulting dataframe has the same values as the previous dataframe but only columns 'A' and 'B' are kept.

pd.pivot(df, index="Item", columns="Category", values=["A", "B"])

	A		B
Category	Category1	Category2	Category1	Category2
Item
Item1	0.191519	NaN	0.622109	NaN
Item10	0.533310	NaN	0.043324	NaN
Item11	0.111894	NaN	0.607194	NaN
Item12	0.912123	NaN	0.790524	NaN
Item13	NaN	0.285251	NaN	0.624917
Item14	0.053874	NaN	0.451648	NaN
Item15	0.738523	NaN	0.587304	NaN
Item2	0.272593	NaN	0.276464	NaN
Item3	0.357817	NaN	0.500995	NaN
Item4	0.561196	NaN	0.503083	NaN
Item5	NaN	0.364886	NaN	0.615396
Item6	0.651378	NaN	0.397203	NaN
Item7	0.869127	NaN	0.436173	NaN
Item8	NaN	0.704581	NaN	0.218792
Item9	NaN	0.059809	NaN	0.184287

In the below cell, we have created pivot table by providing columns and values parameter to pivot() method. The result is exactly the same as our previous cell with the only difference that the index in this example is a range of integers.

pd.pivot(df, columns="Category", values=["A", "B"])

	A		B
Category	Category1	Category2	Category1	Category2
0	0.191519	NaN	0.622109	NaN
1	0.272593	NaN	0.276464	NaN
2	0.357817	NaN	0.500995	NaN
3	0.561196	NaN	0.503083	NaN
4	NaN	0.364886	NaN	0.615396
5	0.651378	NaN	0.397203	NaN
6	0.869127	NaN	0.436173	NaN
7	NaN	0.704581	NaN	0.218792
8	NaN	0.059809	NaN	0.184287
9	0.533310	NaN	0.043324	NaN
10	0.111894	NaN	0.607194	NaN
11	0.912123	NaN	0.790524	NaN
12	NaN	0.285251	NaN	0.624917
13	0.053874	NaN	0.451648	NaN
14	0.738523	NaN	0.587304	NaN

In the below cell, we have created an example where we have provided more than one column name to index parameter. We have asked to use columns 'Item' and 'Type' as indexes of the resulting dataframe. The resulting dataframe has a combination of values from columns 'Item' and 'Type'.

pd.pivot(df, index=["Item", "Type"], columns="Category", values=["A", "B"])

		A		B
	Category	Category1	Category2	Category1	Category2
Item	Type
Item1	Class3	0.191519	NaN	0.622109	NaN
Item10	Class3	0.533310	NaN	0.043324	NaN
Item11	Class1	0.111894	NaN	0.607194	NaN
Item12	Class1	0.912123	NaN	0.790524	NaN
Item13	Class1	NaN	0.285251	NaN	0.624917
Item14	Class1	0.053874	NaN	0.451648	NaN
Item15	Class3	0.738523	NaN	0.587304	NaN
Item2	Class2	0.272593	NaN	0.276464	NaN
Item3	Class2	0.357817	NaN	0.500995	NaN
Item4	Class3	0.561196	NaN	0.503083	NaN
Item5	Class3	NaN	0.364886	NaN	0.615396
Item6	Class1	0.651378	NaN	0.397203	NaN
Item7	Class2	0.869127	NaN	0.436173	NaN
Item8	Class2	NaN	0.704581	NaN	0.218792
Item9	Class2	NaN	0.059809	NaN	0.184287

In the below cell, we have created an example where we have provided more than one column name to columns parameter of pivot() method. The resulting dataframe will have values for columns 'A' and 'B' for each combination of unique values from columns 'Category' and 'Type'.

pvt_tbl = pd.pivot(df, index="Item", columns=["Category","Type"], values=["A", "B"])

pvt_tbl

	A						B
Category	Category1		Category2	Category1	Category2		Category1		Category2	Category1	Category2
Type	Class3	Class2	Class3	Class1	Class2	Class1	Class3	Class2	Class3	Class1	Class2	Class1
Item
Item1	0.191519	NaN	NaN	NaN	NaN	NaN	0.622109	NaN	NaN	NaN	NaN	NaN
Item10	0.533310	NaN	NaN	NaN	NaN	NaN	0.043324	NaN	NaN	NaN	NaN	NaN
Item11	NaN	NaN	NaN	0.111894	NaN	NaN	NaN	NaN	NaN	0.607194	NaN	NaN
Item12	NaN	NaN	NaN	0.912123	NaN	NaN	NaN	NaN	NaN	0.790524	NaN	NaN
Item13	NaN	NaN	NaN	NaN	NaN	0.285251	NaN	NaN	NaN	NaN	NaN	0.624917
Item14	NaN	NaN	NaN	0.053874	NaN	NaN	NaN	NaN	NaN	0.451648	NaN	NaN
Item15	0.738523	NaN	NaN	NaN	NaN	NaN	0.587304	NaN	NaN	NaN	NaN	NaN
Item2	NaN	0.272593	NaN	NaN	NaN	NaN	NaN	0.276464	NaN	NaN	NaN	NaN
Item3	NaN	0.357817	NaN	NaN	NaN	NaN	NaN	0.500995	NaN	NaN	NaN	NaN
Item4	0.561196	NaN	NaN	NaN	NaN	NaN	0.503083	NaN	NaN	NaN	NaN	NaN
Item5	NaN	NaN	0.364886	NaN	NaN	NaN	NaN	NaN	0.615396	NaN	NaN	NaN
Item6	NaN	NaN	NaN	0.651378	NaN	NaN	NaN	NaN	NaN	0.397203	NaN	NaN
Item7	NaN	0.869127	NaN	NaN	NaN	NaN	NaN	0.436173	NaN	NaN	NaN	NaN
Item8	NaN	NaN	NaN	NaN	0.704581	NaN	NaN	NaN	NaN	NaN	0.218792	NaN
Item9	NaN	NaN	NaN	NaN	0.059809	NaN	NaN	NaN	NaN	NaN	0.184287	NaN

pvt_tbl.sort_values(by=["Category","Type"], axis=1)

	A	B	A	B	A	B	A	B	A	B	A	B
Category	Category1	Category1	Category1	Category1	Category1	Category1	Category2	Category2	Category2	Category2	Category2	Category2
Type	Class1	Class1	Class2	Class2	Class3	Class3	Class1	Class1	Class2	Class2	Class3	Class3
Item
Item1	NaN	NaN	NaN	NaN	0.191519	0.622109	NaN	NaN	NaN	NaN	NaN	NaN
Item10	NaN	NaN	NaN	NaN	0.533310	0.043324	NaN	NaN	NaN	NaN	NaN	NaN
Item11	0.111894	0.607194	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item12	0.912123	0.790524	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item13	NaN	NaN	NaN	NaN	NaN	NaN	0.285251	0.624917	NaN	NaN	NaN	NaN
Item14	0.053874	0.451648	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item15	NaN	NaN	NaN	NaN	0.738523	0.587304	NaN	NaN	NaN	NaN	NaN	NaN
Item2	NaN	NaN	0.272593	0.276464	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item3	NaN	NaN	0.357817	0.500995	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item4	NaN	NaN	NaN	NaN	0.561196	0.503083	NaN	NaN	NaN	NaN	NaN	NaN
Item5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.364886	0.615396
Item6	0.651378	0.397203	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item7	NaN	NaN	0.869127	0.436173	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Item8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.704581	0.218792	NaN	NaN
Item9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.059809	0.184287	NaN	NaN

2. pivot_table()¶

In this section, we'll explain how we can use pivot_table() method with simple examples.

pivot_table(data,values,index,columns,aggfunc='mean',fill_value=None,dropna=True,margins=False,margins_name='All',sort=True) - This method takes data frame and index column as input and returns pivoted table. It needs column name at which values will be grouped and index will be created. All other parameters are optional. By default, it applies mean aggregate function on grouped values.
- The values parameter accepts column names whose values will be grouped and kept in the pivot table after performing aggregate statistics. If we don't provide values then all columns except provided to index and columns parameter will be used.
- The index parameter accepts columns names at which values of the main dataframe will be grouped and later one some statistics will be performed on these groups of values. We generally provide categorical columns to this parameter.
- The columns dataframe includes columns whose value will become columns of our pivot table. The values provided to values parameter will be grouped according to columns provided to index and columns parameter. This is an optional parameter.
- The aggfunc accepts single callable, list of callable or dictionary as input. We can provide a numpy stat function or our designed stats function. It should work on a list of values. The default is 'mean'.
  - If we provide a single callable then it'll be applied to all selected columns of data.
  - If we provide a list of callable then it'll apply all of them one by one to selected columns and the resulting pivot table will have column names according to aggregate function and selected columns.
  - The dictionary has a key with column name and value as single or list of callable. It applies aggregate function specified in value to the column specified in the key of the dictionary
- The fill_value parameter accepts value to fill for NaN entries.
- The dropna parameter drops rows where all entries are NaN.
- The margins parameter if set to True will include the addition of all rows and columns at the end of the dataframe.
- The margins_name parameter accepts string specifying the name of totals (column name and index label) that will be created when margins is set to True.

Below we have created a pivot table using pivot_table() method asking it to use a column named 'Item' as an index of the resulting dataframe. It'll treat all other columns of the dataframe as columns of the resulting dataframe. The Pviot_table() method only keeps columns that has float/integer values in resulting dataframe.

pd.pivot_table(df, index="Item")

	A	B	C	D	E
Item
Item1	0.191519	0.622109	0.437728	0.785359	0.779976
Item10	0.533310	0.043324	0.561433	0.329668	0.502967
Item11	0.111894	0.607194	0.565945	0.006764	0.617442
Item12	0.912123	0.790524	0.992081	0.958802	0.791964
Item13	0.285251	0.624917	0.478094	0.195675	0.382317
Item14	0.053874	0.451648	0.982005	0.123943	0.119381
Item15	0.738523	0.587304	0.471633	0.107127	0.229219
Item2	0.272593	0.276464	0.801872	0.958139	0.875933
Item3	0.357817	0.500995	0.683463	0.712702	0.370251
Item4	0.561196	0.503083	0.013768	0.772827	0.882641
Item5	0.364886	0.615396	0.075381	0.368824	0.933140
Item6	0.651378	0.397203	0.788730	0.316836	0.568099
Item7	0.869127	0.436173	0.802148	0.143767	0.704261
Item8	0.704581	0.218792	0.924868	0.442141	0.909316
Item9	0.059809	0.184287	0.047355	0.674881	0.594625

Below we have called pviot_table() method with columns parameter set to column 'Type'. The resulting dataframe will have unique values of column 'Type' as column names and all other columns of the dataframe will be index. The mean aggregate function will be applied to values that are from a particular column and the unique value of 'Type' column. So if we take columns 'A' from the dataframe, then the values of column 'A' that have 'Type' column value 'Class1' will be grouped and the mean function applied to it. Same logic will follow for values that has 'Class2' and 'Class3' as values of column 'Type'.

pd.pivot_table(df, columns="Type")

Type	Class1	Class2	Class3
A	0.402904	0.452786	0.477887
B	0.574297	0.323342	0.474243
C	0.761371	0.651941	0.311989
D	0.320404	0.586326	0.472761
E	0.495841	0.690877	0.665588

In the below cell, we have created an example of pivot_table() where we have provided values for all three main parameters index, columns, and values. We have asked to use values of column 'Category' as an index of dataframe, values of column 'Type' as columns of resulting dataframe, and values from column 'A' will be values of the resulting dataframe after the aggregate function is applied to it.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean')

	A
Type	Class1	Class2	Class3
Category
Category1	0.432317	0.499846	0.506137
Category2	0.285251	0.382195	0.364886

In the below cell, we have provided two columns ('Category' and 'Type') to be used as index of the resulting dataframe. The values of column 'Item' will be columns of the resulting dataframe and values of column 'A' will be values of our resulting dataframe after applying an aggregate function. The index of the resulting dataframe will be created by taking all combinations of different values from columns 'Category' and 'Type'.

pd.pivot_table(df,
               index=["Category", "Type"],
               columns="Item",
               values=["A"],
               aggfunc='mean')

		A
	Item	Item1	Item10	Item11	Item12	Item13	Item14	Item15	Item2	Item3	Item4	Item5	Item6	Item7	Item8	Item9
Category	Type
Category1	Class1	NaN	NaN	0.111894	0.912123	NaN	0.053874	NaN	NaN	NaN	NaN	NaN	0.651378	NaN	NaN	NaN
	Class2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.272593	0.357817	NaN	NaN	NaN	0.869127	NaN	NaN
	Class3	0.191519	0.53331	NaN	NaN	NaN	NaN	0.738523	NaN	NaN	0.561196	NaN	NaN	NaN	NaN	NaN
Category2	Class1	NaN	NaN	NaN	NaN	0.285251	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
	Class2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.704581	0.059809
	Class3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.364886	NaN	NaN	NaN	NaN

In the below cell, we have created an example where we have given two columns ('Category' and 'Type') to be used as columns of the resulting dataframe to pivot_table() method. The 'Item' column values will become an index of our dataframe and values of column 'A' after applying aggregate function will become values of the resulting dataframe. The aggregate function will be applied to values of column 'A' for each combination of values from column 'Item', 'Category', and 'Type'.

pd.pivot_table(df,
               index="Item",
               columns=["Category", "Type"],
               values=["A"],
               aggfunc='mean')

	A
Category	Category1			Category2
Type	Class1	Class2	Class3	Class1	Class2	Class3
Item
Item1	NaN	NaN	0.191519	NaN	NaN	NaN
Item10	NaN	NaN	0.533310	NaN	NaN	NaN
Item11	0.111894	NaN	NaN	NaN	NaN	NaN
Item12	0.912123	NaN	NaN	NaN	NaN	NaN
Item13	NaN	NaN	NaN	0.285251	NaN	NaN
Item14	0.053874	NaN	NaN	NaN	NaN	NaN
Item15	NaN	NaN	0.738523	NaN	NaN	NaN
Item2	NaN	0.272593	NaN	NaN	NaN	NaN
Item3	NaN	0.357817	NaN	NaN	NaN	NaN
Item4	NaN	NaN	0.561196	NaN	NaN	NaN
Item5	NaN	NaN	NaN	NaN	NaN	0.364886
Item6	0.651378	NaN	NaN	NaN	NaN	NaN
Item7	NaN	0.869127	NaN	NaN	NaN	NaN
Item8	NaN	NaN	NaN	NaN	0.704581	NaN
Item9	NaN	NaN	NaN	NaN	0.059809	NaN

In the below cell, we have created an example where we have given more than one aggregate function to pivot_table(). We can notice that the resulting dataframe has an entry for each function.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc=[np.mean, np.std, np.var])

	mean			std			var
	A			A			A
Type	Class1	Class2	Class3	Class1	Class2	Class3	Class1	Class2	Class3
Category
Category1	0.432317	0.499846	0.506137	0.417968	0.322634	0.228588	0.174697	0.104093	0.052252
Category2	0.285251	0.382195	0.364886	NaN	0.455923	NaN	NaN	0.207866	NaN

In the below cell, we have created an example where we have given dictionary to aggfunc parameter of pivot_table() function. We have asked it to use 'mean' function on column 'A' values and 'std' function on column 'B' values.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": np.std})

	A			B
Type	Class1	Class2	Class3	Class1	Class2	Class3
Category
Category1	0.432317	0.499846	0.506137	0.176636	0.115559	0.268445
Category2	0.285251	0.382195	0.364886	NaN	0.024399	NaN

In the below cell, we have created another example demonstrating how we can provide a dictionary of aggregate functions to aggfunc parameter of pivot_table(). This time we have provided two functions to be used on values of column 'B'.

pd.pivot_table(df, index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]})

	A			B
	mean			mean			std
Type	Class1	Class2	Class3	Class1	Class2	Class3	Class1	Class2	Class3
Category
Category1	0.432317	0.499846	0.506137	0.561642	0.404544	0.438955	0.176636	0.115559	0.268445
Category2	0.285251	0.382195	0.364886	0.624917	0.201540	0.615396	NaN	0.024399	NaN

In the below cell, we have recreated an example from the previous cell and provided value for parameter fill_value so that all NaNs will be filled with this value.

pd.pivot_table(df, index="Category",
               columns="Type",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]},
               fill_value=0.000001
              )

	A			B
	mean			mean			std
Type	Class1	Class2	Class3	Class1	Class2	Class3	Class1	Class2	Class3
Category
Category1	0.432317	0.499846	0.506137	0.561642	0.404544	0.438955	0.176636	0.115559	0.268445
Category2	0.285251	0.382195	0.364886	0.624917	0.201540	0.615396	0.000001	0.024399	0.000001

In the below cell, we have recreated an example from previous cells with the only difference that we have provided dropna parameter this time. We have asked the resulting dataframe to keep entries where all values are NaNs.

pd.pivot_table(df, index="Type",
               columns="Category",
               values=["A","B"],
               aggfunc={"A": np.mean, "B": [np.mean, np.std]},
               dropna=False
              )

	A				B
	mean		std		mean		std
Category	Category1	Category2	Category1	Category2	Category1	Category2	Category1	Category2
Type
Class1	0.432317	0.285251	NaN	NaN	0.561642	0.624917	0.176636	NaN
Class2	0.499846	0.382195	NaN	NaN	0.404544	0.201540	0.115559	0.024399
Class3	0.506137	0.364886	NaN	NaN	0.438955	0.615396	0.268445	NaN

In the below cell, we have created another example demonstrating the usage of pivot_table(). This time we have included the usage of parameters margins and margins_name.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean',
               margins=True,
               margins_name="Column Types Average"
              )

	A
Type	Class1	Class2	Class3	Column Types Average
Category
Category1	0.432317	0.499846	0.506137	0.477578
Category2	0.285251	0.382195	0.364886	0.353632
Column Types Average	0.402904	0.452786	0.477887	0.444526

print("Category1 Avg - {}\nCategory2 Avg - {}".format(df[df.Category=="Category1"]['A'].mean(),
                                                      df[df.Category=="Category2"]['A'].mean()))

print("\nClass1 Avg - {}\nClass2 Avg - {}\nClass3 Avg - {}".format(df[df.Type=="Class1"]['A'].mean(),
                                                                   df[df.Type=="Class2"]['A'].mean(),
                                                                   df[df.Type=="Class3"]['A'].mean()))

Category1 Avg - 0.4775777411607928
Category2 Avg - 0.35363186872382657

Class1 Avg - 0.40290399848097636
Class2 Avg - 0.4527855591542317
Class3 Avg - 0.4778869678975973

In the below cell, we have created the last example demonstrating function pivot_table(). We have asked the function to keep entries unsorted by providing sort parameter as False.

pd.pivot_table(df,
               index="Category",
               columns="Type",
               values=["A"],
               aggfunc='mean',
               margins=True,
               margins_name="Column Types Average",
               sort=False
              )

	A
Type	Class1	Class2	Class3	Column Types Average
Category
Category1	0.432317	0.499846	0.506137	0.477578
Category2	0.285251	0.382195	0.364886	0.353632
Column Types Average	0.402904	0.452786	0.477887	0.444526

This ends our small tutorial explaining how we can use pandas functions pivot() and pivot_table() to create pivot tables from pandas dataframe. Please feel free to let us know your views in the comments section.

References¶

Pandas Pivot Table Docs

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

pivot-tables, pandas-dataframe

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

Guide to Create Pivot Tables from Pandas DataFrame¶

1. pivot()¶

2. pivot_table()¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription