Pivot table is a table where we have grouped values of our extensive table based on values of some columns (generally categorical columns) of data. We can then perform some aggregate operations on these aggregate values as well. Pivot tables rearrange data and perform statistics on them which can help us find meaningful insights which won't be otherwise visible hence it has become an integral part of data analysis. Pandas is the most commonly used library to load and work with tabular data in Python. It also provides two methods to create pivot tables from pandas dataframe.
As a part of this tutorial, we'll explain both methods with simple examples to make their usage clear.
Below, we have imported the necessary libraries that we'll use for our tutorial. We have printed the version of pandas as well that we'll be using in our tutorial.
Then, in the next cell, we have created a pandas dataframe which we'll use to explain the usage of pivot() and pivot_table() methods. Dataframe is created using random numbers from numpy. We have also kept two categorical columns in the dataframe to perform various aggregate stats on them.
import pandas as pd
print("Pandas Version : {}".format(pd.__version__))
import numpy as np
np.random.seed(1234)
data = np.random.rand(15,5)
df = pd.DataFrame(data=data, columns=list("ABCDE"))
df["Type"] = np.random.choice(["Class1","Class2","Class3"], size=len(df))
df["Category"] = np.random.choice(["Category1","Category2"], size=len(df))
df["Item"] = ["Item%d"%(i+1) for i in range(df.shape[0])]
df
In this section, we'll explain various examples explaining how to use pivot() method to pivot values of the dataframe.
Below we have pivoted our dataframe by asking it to use a column named 'Category' for columns parameter of pivot() method. The 'Category' column has two unique values ('Category1' and 'Category2'). This will create column names based on combination of all remaining columns (['A','B','C','D','E','Type']) and unique values of 'Category' column. The index of the resulting dataframe will be simple integer range-based indexing. The values of the dataframe will be based on the column name and unique value of 'Category' column. If there is a value for a particular column and a unique value from 'Category' column then the resulting dataframe will have value else there will be NaN.
pd.pivot(df, columns="Category")
Below we have created another example demonstrating usage of pivot() method where we have again used column 'Category' for columns parameter. This time we have informed the method to use column named 'Item' for index parameter. We can notice that the resulting dataframe has the same column names as the previous dataframe except that 'Item' column has moved to the index of the dataframe.
pd.pivot(df, index="Item", columns="Category")
In the below example, we have provided all three parameters index, columns and values to pivot() method. The values of parameters index and columns are the same as the previous cell. The values parameter is asked to keep only two columns (['A','B']). The resulting dataframe has the same values as the previous dataframe but only columns 'A' and 'B' are kept.
pd.pivot(df, index="Item", columns="Category", values=["A", "B"])
In the below cell, we have created pivot table by providing columns and values parameter to pivot() method. The result is exactly the same as our previous cell with the only difference that the index in this example is a range of integers.
pd.pivot(df, columns="Category", values=["A", "B"])
In the below cell, we have created an example where we have provided more than one column name to index parameter. We have asked to use columns 'Item' and 'Type' as indexes of the resulting dataframe. The resulting dataframe has a combination of values from columns 'Item' and 'Type'.
pd.pivot(df, index=["Item", "Type"], columns="Category", values=["A", "B"])
In the below cell, we have created an example where we have provided more than one column name to columns parameter of pivot() method. The resulting dataframe will have values for columns 'A' and 'B' for each combination of unique values from columns 'Category' and 'Type'.
pvt_tbl = pd.pivot(df, index="Item", columns=["Category","Type"], values=["A", "B"])
pvt_tbl
pvt_tbl.sort_values(by=["Category","Type"], axis=1)
In this section, we'll explain how we can use pivot_table() method with simple examples.
Below we have created a pivot table using pivot_table() method asking it to use a column named 'Item' as an index of the resulting dataframe. It'll treat all other columns of the dataframe as columns of the resulting dataframe. The Pviot_table() method only keeps columns that has float/integer values in resulting dataframe.
pd.pivot_table(df, index="Item")
Below we have called pviot_table() method with columns parameter set to column 'Type'. The resulting dataframe will have unique values of column 'Type' as column names and all other columns of the dataframe will be index. The mean aggregate function will be applied to values that are from a particular column and the unique value of 'Type' column. So if we take columns 'A' from the dataframe, then the values of column 'A' that have 'Type' column value 'Class1' will be grouped and the mean function applied to it. Same logic will follow for values that has 'Class2' and 'Class3' as values of column 'Type'.
pd.pivot_table(df, columns="Type")
In the below cell, we have created an example of pivot_table() where we have provided values for all three main parameters index, columns, and values. We have asked to use values of column 'Category' as an index of dataframe, values of column 'Type' as columns of resulting dataframe, and values from column 'A' will be values of the resulting dataframe after the aggregate function is applied to it.
pd.pivot_table(df,
index="Category",
columns="Type",
values=["A"],
aggfunc='mean')
In the below cell, we have provided two columns ('Category' and 'Type') to be used as index of the resulting dataframe. The values of column 'Item' will be columns of the resulting dataframe and values of column 'A' will be values of our resulting dataframe after applying an aggregate function. The index of the resulting dataframe will be created by taking all combinations of different values from columns 'Category' and 'Type'.
pd.pivot_table(df,
index=["Category", "Type"],
columns="Item",
values=["A"],
aggfunc='mean')
In the below cell, we have created an example where we have given two columns ('Category' and 'Type') to be used as columns of the resulting dataframe to pivot_table() method. The 'Item' column values will become an index of our dataframe and values of column 'A' after applying aggregate function will become values of the resulting dataframe. The aggregate function will be applied to values of column 'A' for each combination of values from column 'Item', 'Category', and 'Type'.
pd.pivot_table(df,
index="Item",
columns=["Category", "Type"],
values=["A"],
aggfunc='mean')
In the below cell, we have created an example where we have given more than one aggregate function to pivot_table(). We can notice that the resulting dataframe has an entry for each function.
pd.pivot_table(df,
index="Category",
columns="Type",
values=["A"],
aggfunc=[np.mean, np.std, np.var])
In the below cell, we have created an example where we have given dictionary to aggfunc parameter of pivot_table() function. We have asked it to use 'mean' function on column 'A' values and 'std' function on column 'B' values.
pd.pivot_table(df,
index="Category",
columns="Type",
values=["A","B"],
aggfunc={"A": np.mean, "B": np.std})
In the below cell, we have created another example demonstrating how we can provide a dictionary of aggregate functions to aggfunc parameter of pivot_table(). This time we have provided two functions to be used on values of column 'B'.
pd.pivot_table(df, index="Category",
columns="Type",
values=["A","B"],
aggfunc={"A": np.mean, "B": [np.mean, np.std]})
In the below cell, we have recreated an example from the previous cell and provided value for parameter fill_value so that all NaNs will be filled with this value.
pd.pivot_table(df, index="Category",
columns="Type",
values=["A","B"],
aggfunc={"A": np.mean, "B": [np.mean, np.std]},
fill_value=0.000001
)
In the below cell, we have recreated an example from previous cells with the only difference that we have provided dropna parameter this time. We have asked the resulting dataframe to keep entries where all values are NaNs.
pd.pivot_table(df, index="Type",
columns="Category",
values=["A","B"],
aggfunc={"A": np.mean, "B": [np.mean, np.std]},
dropna=False
)
In the below cell, we have created another example demonstrating the usage of pivot_table(). This time we have included the usage of parameters margins and margins_name.
pd.pivot_table(df,
index="Category",
columns="Type",
values=["A"],
aggfunc='mean',
margins=True,
margins_name="Column Types Average"
)
print("Category1 Avg - {}\nCategory2 Avg - {}".format(df[df.Category=="Category1"]['A'].mean(),
df[df.Category=="Category2"]['A'].mean()))
print("\nClass1 Avg - {}\nClass2 Avg - {}\nClass3 Avg - {}".format(df[df.Type=="Class1"]['A'].mean(),
df[df.Type=="Class2"]['A'].mean(),
df[df.Type=="Class3"]['A'].mean()))
In the below cell, we have created the last example demonstrating function pivot_table(). We have asked the function to keep entries unsorted by providing sort parameter as False.
pd.pivot_table(df,
index="Category",
columns="Type",
values=["A"],
aggfunc='mean',
margins=True,
margins_name="Column Types Average",
sort=False
)
This ends our small tutorial explaining how we can use pandas functions pivot() and pivot_table() to create pivot tables from pandas dataframe. Please feel free to let us know your views in the comments section.
If you want to