Guide to Create Pivot Tables from Pandas DataFrame¶
Pivot table is a table where we have grouped values of our extensive table based on values of some columns (generally categorical columns) of data. We can then perform some aggregate operations on these aggregate values as well. Pivot tables rearrange data and perform statistics on them which can help us find meaningful insights which won't be otherwise visible hence it has become an integral part of data analysis. Pandas is the most commonly used library to load and work with tabular data in Python. It also provides two methods to create pivot tables from pandas dataframe.
- pivot() - This method simply rearranges the dataframe based on the given input index, columns, and values.
- pivot_table() - This method rearranges dataframe and can also let us perform statistics on values.
As a part of this tutorial, we'll explain both methods with simple examples to make their usage clear.
Below, we have imported the necessary libraries that we'll use for our tutorial. We have printed the version of pandas as well that we'll be using in our tutorial.
Then, in the next cell, we have created a pandas dataframe which we'll use to explain the usage of pivot() and pivot_table() methods. Dataframe is created using random numbers from numpy. We have also kept two categorical columns in the dataframe to perform various aggregate stats on them.
|
A |
B |
C |
D |
E |
Type |
Category |
Item |
0 |
0.191519 |
0.622109 |
0.437728 |
0.785359 |
0.779976 |
Class3 |
Category1 |
Item1 |
1 |
0.272593 |
0.276464 |
0.801872 |
0.958139 |
0.875933 |
Class2 |
Category1 |
Item2 |
2 |
0.357817 |
0.500995 |
0.683463 |
0.712702 |
0.370251 |
Class2 |
Category1 |
Item3 |
3 |
0.561196 |
0.503083 |
0.013768 |
0.772827 |
0.882641 |
Class3 |
Category1 |
Item4 |
4 |
0.364886 |
0.615396 |
0.075381 |
0.368824 |
0.933140 |
Class3 |
Category2 |
Item5 |
5 |
0.651378 |
0.397203 |
0.788730 |
0.316836 |
0.568099 |
Class1 |
Category1 |
Item6 |
6 |
0.869127 |
0.436173 |
0.802148 |
0.143767 |
0.704261 |
Class2 |
Category1 |
Item7 |
7 |
0.704581 |
0.218792 |
0.924868 |
0.442141 |
0.909316 |
Class2 |
Category2 |
Item8 |
8 |
0.059809 |
0.184287 |
0.047355 |
0.674881 |
0.594625 |
Class2 |
Category2 |
Item9 |
9 |
0.533310 |
0.043324 |
0.561433 |
0.329668 |
0.502967 |
Class3 |
Category1 |
Item10 |
10 |
0.111894 |
0.607194 |
0.565945 |
0.006764 |
0.617442 |
Class1 |
Category1 |
Item11 |
11 |
0.912123 |
0.790524 |
0.992081 |
0.958802 |
0.791964 |
Class1 |
Category1 |
Item12 |
12 |
0.285251 |
0.624917 |
0.478094 |
0.195675 |
0.382317 |
Class1 |
Category2 |
Item13 |
13 |
0.053874 |
0.451648 |
0.982005 |
0.123943 |
0.119381 |
Class1 |
Category1 |
Item14 |
14 |
0.738523 |
0.587304 |
0.471633 |
0.107127 |
0.229219 |
Class3 |
Category1 |
Item15 |
1. pivot()¶
In this section, we'll explain various examples explaining how to use pivot() method to pivot values of the dataframe.
- pivot(data,index,columns,values) - This method takes dataframe and columns names as input to create pivot table from it. The index and values are optional arguments.
- The columns parameter accepts one or more column names as input which will be columns of our pivot table. These are generally categorical columns of data.
- The index parameter accepts one or more columns names as input that will be a new index of our pivot table. These columns can be categorical or not. If we don't provide index parameter then a simple number-based index will be included in the resulting pivot table.
- The values parameter accepts one or more column names as input whose values will be included in the pivot table. If we don't provide values parameter then all columns except provided to columns and values parameter will be included.
Below we have pivoted our dataframe by asking it to use a column named 'Category' for columns parameter of pivot() method. The 'Category' column has two unique values ('Category1' and 'Category2'). This will create column names based on combination of all remaining columns (['A','B','C','D','E','Type']) and unique values of 'Category' column. The index of the resulting dataframe will be simple integer range-based indexing. The values of the dataframe will be based on the column name and unique value of 'Category' column. If there is a value for a particular column and a unique value from 'Category' column then the resulting dataframe will have value else there will be NaN.
|
A |
B |
C |
D |
E |
Type |
Item |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
0 |
0.191519 |
NaN |
0.622109 |
NaN |
0.437728 |
NaN |
0.785359 |
NaN |
0.779976 |
NaN |
Class3 |
NaN |
Item1 |
NaN |
1 |
0.272593 |
NaN |
0.276464 |
NaN |
0.801872 |
NaN |
0.958139 |
NaN |
0.875933 |
NaN |
Class2 |
NaN |
Item2 |
NaN |
2 |
0.357817 |
NaN |
0.500995 |
NaN |
0.683463 |
NaN |
0.712702 |
NaN |
0.370251 |
NaN |
Class2 |
NaN |
Item3 |
NaN |
3 |
0.561196 |
NaN |
0.503083 |
NaN |
0.013768 |
NaN |
0.772827 |
NaN |
0.882641 |
NaN |
Class3 |
NaN |
Item4 |
NaN |
4 |
NaN |
0.364886 |
NaN |
0.615396 |
NaN |
0.075381 |
NaN |
0.368824 |
NaN |
0.933140 |
NaN |
Class3 |
NaN |
Item5 |
5 |
0.651378 |
NaN |
0.397203 |
NaN |
0.788730 |
NaN |
0.316836 |
NaN |
0.568099 |
NaN |
Class1 |
NaN |
Item6 |
NaN |
6 |
0.869127 |
NaN |
0.436173 |
NaN |
0.802148 |
NaN |
0.143767 |
NaN |
0.704261 |
NaN |
Class2 |
NaN |
Item7 |
NaN |
7 |
NaN |
0.704581 |
NaN |
0.218792 |
NaN |
0.924868 |
NaN |
0.442141 |
NaN |
0.909316 |
NaN |
Class2 |
NaN |
Item8 |
8 |
NaN |
0.059809 |
NaN |
0.184287 |
NaN |
0.047355 |
NaN |
0.674881 |
NaN |
0.594625 |
NaN |
Class2 |
NaN |
Item9 |
9 |
0.533310 |
NaN |
0.043324 |
NaN |
0.561433 |
NaN |
0.329668 |
NaN |
0.502967 |
NaN |
Class3 |
NaN |
Item10 |
NaN |
10 |
0.111894 |
NaN |
0.607194 |
NaN |
0.565945 |
NaN |
0.006764 |
NaN |
0.617442 |
NaN |
Class1 |
NaN |
Item11 |
NaN |
11 |
0.912123 |
NaN |
0.790524 |
NaN |
0.992081 |
NaN |
0.958802 |
NaN |
0.791964 |
NaN |
Class1 |
NaN |
Item12 |
NaN |
12 |
NaN |
0.285251 |
NaN |
0.624917 |
NaN |
0.478094 |
NaN |
0.195675 |
NaN |
0.382317 |
NaN |
Class1 |
NaN |
Item13 |
13 |
0.053874 |
NaN |
0.451648 |
NaN |
0.982005 |
NaN |
0.123943 |
NaN |
0.119381 |
NaN |
Class1 |
NaN |
Item14 |
NaN |
14 |
0.738523 |
NaN |
0.587304 |
NaN |
0.471633 |
NaN |
0.107127 |
NaN |
0.229219 |
NaN |
Class3 |
NaN |
Item15 |
NaN |
Below we have created another example demonstrating usage of pivot() method where we have again used column 'Category' for columns parameter. This time we have informed the method to use column named 'Item' for index parameter. We can notice that the resulting dataframe has the same column names as the previous dataframe except that 'Item' column has moved to the index of the dataframe.
|
A |
B |
C |
D |
E |
Type |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Item |
|
|
|
|
|
|
|
|
|
|
|
|
Item1 |
0.191519 |
NaN |
0.622109 |
NaN |
0.437728 |
NaN |
0.785359 |
NaN |
0.779976 |
NaN |
Class3 |
NaN |
Item10 |
0.533310 |
NaN |
0.043324 |
NaN |
0.561433 |
NaN |
0.329668 |
NaN |
0.502967 |
NaN |
Class3 |
NaN |
Item11 |
0.111894 |
NaN |
0.607194 |
NaN |
0.565945 |
NaN |
0.006764 |
NaN |
0.617442 |
NaN |
Class1 |
NaN |
Item12 |
0.912123 |
NaN |
0.790524 |
NaN |
0.992081 |
NaN |
0.958802 |
NaN |
0.791964 |
NaN |
Class1 |
NaN |
Item13 |
NaN |
0.285251 |
NaN |
0.624917 |
NaN |
0.478094 |
NaN |
0.195675 |
NaN |
0.382317 |
NaN |
Class1 |
Item14 |
0.053874 |
NaN |
0.451648 |
NaN |
0.982005 |
NaN |
0.123943 |
NaN |
0.119381 |
NaN |
Class1 |
NaN |
Item15 |
0.738523 |
NaN |
0.587304 |
NaN |
0.471633 |
NaN |
0.107127 |
NaN |
0.229219 |
NaN |
Class3 |
NaN |
Item2 |
0.272593 |
NaN |
0.276464 |
NaN |
0.801872 |
NaN |
0.958139 |
NaN |
0.875933 |
NaN |
Class2 |
NaN |
Item3 |
0.357817 |
NaN |
0.500995 |
NaN |
0.683463 |
NaN |
0.712702 |
NaN |
0.370251 |
NaN |
Class2 |
NaN |
Item4 |
0.561196 |
NaN |
0.503083 |
NaN |
0.013768 |
NaN |
0.772827 |
NaN |
0.882641 |
NaN |
Class3 |
NaN |
Item5 |
NaN |
0.364886 |
NaN |
0.615396 |
NaN |
0.075381 |
NaN |
0.368824 |
NaN |
0.933140 |
NaN |
Class3 |
Item6 |
0.651378 |
NaN |
0.397203 |
NaN |
0.788730 |
NaN |
0.316836 |
NaN |
0.568099 |
NaN |
Class1 |
NaN |
Item7 |
0.869127 |
NaN |
0.436173 |
NaN |
0.802148 |
NaN |
0.143767 |
NaN |
0.704261 |
NaN |
Class2 |
NaN |
Item8 |
NaN |
0.704581 |
NaN |
0.218792 |
NaN |
0.924868 |
NaN |
0.442141 |
NaN |
0.909316 |
NaN |
Class2 |
Item9 |
NaN |
0.059809 |
NaN |
0.184287 |
NaN |
0.047355 |
NaN |
0.674881 |
NaN |
0.594625 |
NaN |
Class2 |
In the below example, we have provided all three parameters index, columns and values to pivot() method. The values of parameters index and columns are the same as the previous cell. The values parameter is asked to keep only two columns (['A','B']). The resulting dataframe has the same values as the previous dataframe but only columns 'A' and 'B' are kept.
|
A |
B |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Item |
|
|
|
|
Item1 |
0.191519 |
NaN |
0.622109 |
NaN |
Item10 |
0.533310 |
NaN |
0.043324 |
NaN |
Item11 |
0.111894 |
NaN |
0.607194 |
NaN |
Item12 |
0.912123 |
NaN |
0.790524 |
NaN |
Item13 |
NaN |
0.285251 |
NaN |
0.624917 |
Item14 |
0.053874 |
NaN |
0.451648 |
NaN |
Item15 |
0.738523 |
NaN |
0.587304 |
NaN |
Item2 |
0.272593 |
NaN |
0.276464 |
NaN |
Item3 |
0.357817 |
NaN |
0.500995 |
NaN |
Item4 |
0.561196 |
NaN |
0.503083 |
NaN |
Item5 |
NaN |
0.364886 |
NaN |
0.615396 |
Item6 |
0.651378 |
NaN |
0.397203 |
NaN |
Item7 |
0.869127 |
NaN |
0.436173 |
NaN |
Item8 |
NaN |
0.704581 |
NaN |
0.218792 |
Item9 |
NaN |
0.059809 |
NaN |
0.184287 |
In the below cell, we have created pivot table by providing columns and values parameter to pivot() method. The result is exactly the same as our previous cell with the only difference that the index in this example is a range of integers.
|
A |
B |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
0 |
0.191519 |
NaN |
0.622109 |
NaN |
1 |
0.272593 |
NaN |
0.276464 |
NaN |
2 |
0.357817 |
NaN |
0.500995 |
NaN |
3 |
0.561196 |
NaN |
0.503083 |
NaN |
4 |
NaN |
0.364886 |
NaN |
0.615396 |
5 |
0.651378 |
NaN |
0.397203 |
NaN |
6 |
0.869127 |
NaN |
0.436173 |
NaN |
7 |
NaN |
0.704581 |
NaN |
0.218792 |
8 |
NaN |
0.059809 |
NaN |
0.184287 |
9 |
0.533310 |
NaN |
0.043324 |
NaN |
10 |
0.111894 |
NaN |
0.607194 |
NaN |
11 |
0.912123 |
NaN |
0.790524 |
NaN |
12 |
NaN |
0.285251 |
NaN |
0.624917 |
13 |
0.053874 |
NaN |
0.451648 |
NaN |
14 |
0.738523 |
NaN |
0.587304 |
NaN |
In the below cell, we have created an example where we have provided more than one column name to index parameter. We have asked to use columns 'Item' and 'Type' as indexes of the resulting dataframe. The resulting dataframe has a combination of values from columns 'Item' and 'Type'.
|
|
A |
B |
|
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Item |
Type |
|
|
|
|
Item1 |
Class3 |
0.191519 |
NaN |
0.622109 |
NaN |
Item10 |
Class3 |
0.533310 |
NaN |
0.043324 |
NaN |
Item11 |
Class1 |
0.111894 |
NaN |
0.607194 |
NaN |
Item12 |
Class1 |
0.912123 |
NaN |
0.790524 |
NaN |
Item13 |
Class1 |
NaN |
0.285251 |
NaN |
0.624917 |
Item14 |
Class1 |
0.053874 |
NaN |
0.451648 |
NaN |
Item15 |
Class3 |
0.738523 |
NaN |
0.587304 |
NaN |
Item2 |
Class2 |
0.272593 |
NaN |
0.276464 |
NaN |
Item3 |
Class2 |
0.357817 |
NaN |
0.500995 |
NaN |
Item4 |
Class3 |
0.561196 |
NaN |
0.503083 |
NaN |
Item5 |
Class3 |
NaN |
0.364886 |
NaN |
0.615396 |
Item6 |
Class1 |
0.651378 |
NaN |
0.397203 |
NaN |
Item7 |
Class2 |
0.869127 |
NaN |
0.436173 |
NaN |
Item8 |
Class2 |
NaN |
0.704581 |
NaN |
0.218792 |
Item9 |
Class2 |
NaN |
0.059809 |
NaN |
0.184287 |
In the below cell, we have created an example where we have provided more than one column name to columns parameter of pivot() method. The resulting dataframe will have values for columns 'A' and 'B' for each combination of unique values from columns 'Category' and 'Type'.
|
A |
B |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Type |
Class3 |
Class2 |
Class3 |
Class1 |
Class2 |
Class1 |
Class3 |
Class2 |
Class3 |
Class1 |
Class2 |
Class1 |
Item |
|
|
|
|
|
|
|
|
|
|
|
|
Item1 |
0.191519 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.622109 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item10 |
0.533310 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.043324 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item11 |
NaN |
NaN |
NaN |
0.111894 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.607194 |
NaN |
NaN |
Item12 |
NaN |
NaN |
NaN |
0.912123 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.790524 |
NaN |
NaN |
Item13 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.285251 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.624917 |
Item14 |
NaN |
NaN |
NaN |
0.053874 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.451648 |
NaN |
NaN |
Item15 |
0.738523 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.587304 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item2 |
NaN |
0.272593 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.276464 |
NaN |
NaN |
NaN |
NaN |
Item3 |
NaN |
0.357817 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.500995 |
NaN |
NaN |
NaN |
NaN |
Item4 |
0.561196 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.503083 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item5 |
NaN |
NaN |
0.364886 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.615396 |
NaN |
NaN |
NaN |
Item6 |
NaN |
NaN |
NaN |
0.651378 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.397203 |
NaN |
NaN |
Item7 |
NaN |
0.869127 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.436173 |
NaN |
NaN |
NaN |
NaN |
Item8 |
NaN |
NaN |
NaN |
NaN |
0.704581 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.218792 |
NaN |
Item9 |
NaN |
NaN |
NaN |
NaN |
0.059809 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.184287 |
NaN |
|
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
A |
B |
Category |
Category1 |
Category1 |
Category1 |
Category1 |
Category1 |
Category1 |
Category2 |
Category2 |
Category2 |
Category2 |
Category2 |
Category2 |
Type |
Class1 |
Class1 |
Class2 |
Class2 |
Class3 |
Class3 |
Class1 |
Class1 |
Class2 |
Class2 |
Class3 |
Class3 |
Item |
|
|
|
|
|
|
|
|
|
|
|
|
Item1 |
NaN |
NaN |
NaN |
NaN |
0.191519 |
0.622109 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item10 |
NaN |
NaN |
NaN |
NaN |
0.533310 |
0.043324 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item11 |
0.111894 |
0.607194 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item12 |
0.912123 |
0.790524 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item13 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.285251 |
0.624917 |
NaN |
NaN |
NaN |
NaN |
Item14 |
0.053874 |
0.451648 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item15 |
NaN |
NaN |
NaN |
NaN |
0.738523 |
0.587304 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item2 |
NaN |
NaN |
0.272593 |
0.276464 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item3 |
NaN |
NaN |
0.357817 |
0.500995 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item4 |
NaN |
NaN |
NaN |
NaN |
0.561196 |
0.503083 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item5 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.364886 |
0.615396 |
Item6 |
0.651378 |
0.397203 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item7 |
NaN |
NaN |
0.869127 |
0.436173 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Item8 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.704581 |
0.218792 |
NaN |
NaN |
Item9 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.059809 |
0.184287 |
NaN |
NaN |
2. pivot_table()¶
In this section, we'll explain how we can use pivot_table() method with simple examples.
- pivot_table(data,values,index,columns,aggfunc='mean',fill_value=None,dropna=True,margins=False,margins_name='All',sort=True) - This method takes data frame and index column as input and returns pivoted table. It needs column name at which values will be grouped and index will be created. All other parameters are optional. By default, it applies mean aggregate function on grouped values.
- The values parameter accepts column names whose values will be grouped and kept in the pivot table after performing aggregate statistics. If we don't provide values then all columns except provided to index and columns parameter will be used.
- The index parameter accepts columns names at which values of the main dataframe will be grouped and later one some statistics will be performed on these groups of values. We generally provide categorical columns to this parameter.
- The columns dataframe includes columns whose value will become columns of our pivot table. The values provided to values parameter will be grouped according to columns provided to index and columns parameter. This is an optional parameter.
- The aggfunc accepts single callable, list of callable or dictionary as input. We can provide a numpy stat function or our designed stats function. It should work on a list of values. The default is 'mean'.
- If we provide a single callable then it'll be applied to all selected columns of data.
- If we provide a list of callable then it'll apply all of them one by one to selected columns and the resulting pivot table will have column names according to aggregate function and selected columns.
- The dictionary has a key with column name and value as single or list of callable. It applies aggregate function specified in value to the column specified in the key of the dictionary
- The fill_value parameter accepts value to fill for NaN entries.
- The dropna parameter drops rows where all entries are NaN.
- The margins parameter if set to True will include the addition of all rows and columns at the end of the dataframe.
- The margins_name parameter accepts string specifying the name of totals (column name and index label) that will be created when margins is set to True.
Below we have created a pivot table using pivot_table() method asking it to use a column named 'Item' as an index of the resulting dataframe. It'll treat all other columns of the dataframe as columns of the resulting dataframe. The Pviot_table() method only keeps columns that has float/integer values in resulting dataframe.
|
A |
B |
C |
D |
E |
Item |
|
|
|
|
|
Item1 |
0.191519 |
0.622109 |
0.437728 |
0.785359 |
0.779976 |
Item10 |
0.533310 |
0.043324 |
0.561433 |
0.329668 |
0.502967 |
Item11 |
0.111894 |
0.607194 |
0.565945 |
0.006764 |
0.617442 |
Item12 |
0.912123 |
0.790524 |
0.992081 |
0.958802 |
0.791964 |
Item13 |
0.285251 |
0.624917 |
0.478094 |
0.195675 |
0.382317 |
Item14 |
0.053874 |
0.451648 |
0.982005 |
0.123943 |
0.119381 |
Item15 |
0.738523 |
0.587304 |
0.471633 |
0.107127 |
0.229219 |
Item2 |
0.272593 |
0.276464 |
0.801872 |
0.958139 |
0.875933 |
Item3 |
0.357817 |
0.500995 |
0.683463 |
0.712702 |
0.370251 |
Item4 |
0.561196 |
0.503083 |
0.013768 |
0.772827 |
0.882641 |
Item5 |
0.364886 |
0.615396 |
0.075381 |
0.368824 |
0.933140 |
Item6 |
0.651378 |
0.397203 |
0.788730 |
0.316836 |
0.568099 |
Item7 |
0.869127 |
0.436173 |
0.802148 |
0.143767 |
0.704261 |
Item8 |
0.704581 |
0.218792 |
0.924868 |
0.442141 |
0.909316 |
Item9 |
0.059809 |
0.184287 |
0.047355 |
0.674881 |
0.594625 |
Below we have called pviot_table() method with columns parameter set to column 'Type'. The resulting dataframe will have unique values of column 'Type' as column names and all other columns of the dataframe will be index. The mean aggregate function will be applied to values that are from a particular column and the unique value of 'Type' column. So if we take columns 'A' from the dataframe, then the values of column 'A' that have 'Type' column value 'Class1' will be grouped and the mean function applied to it. Same logic will follow for values that has 'Class2' and 'Class3' as values of column 'Type'.
Type |
Class1 |
Class2 |
Class3 |
A |
0.402904 |
0.452786 |
0.477887 |
B |
0.574297 |
0.323342 |
0.474243 |
C |
0.761371 |
0.651941 |
0.311989 |
D |
0.320404 |
0.586326 |
0.472761 |
E |
0.495841 |
0.690877 |
0.665588 |
In the below cell, we have created an example of pivot_table() where we have provided values for all three main parameters index, columns, and values. We have asked to use values of column 'Category' as an index of dataframe, values of column 'Type' as columns of resulting dataframe, and values from column 'A' will be values of the resulting dataframe after the aggregate function is applied to it.
|
A |
Type |
Class1 |
Class2 |
Class3 |
Category |
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
In the below cell, we have provided two columns ('Category' and 'Type') to be used as index of the resulting dataframe. The values of column 'Item' will be columns of the resulting dataframe and values of column 'A' will be values of our resulting dataframe after applying an aggregate function. The index of the resulting dataframe will be created by taking all combinations of different values from columns 'Category' and 'Type'.
|
|
A |
|
Item |
Item1 |
Item10 |
Item11 |
Item12 |
Item13 |
Item14 |
Item15 |
Item2 |
Item3 |
Item4 |
Item5 |
Item6 |
Item7 |
Item8 |
Item9 |
Category |
Type |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Category1 |
Class1 |
NaN |
NaN |
0.111894 |
0.912123 |
NaN |
0.053874 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.651378 |
NaN |
NaN |
NaN |
Class2 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.272593 |
0.357817 |
NaN |
NaN |
NaN |
0.869127 |
NaN |
NaN |
Class3 |
0.191519 |
0.53331 |
NaN |
NaN |
NaN |
NaN |
0.738523 |
NaN |
NaN |
0.561196 |
NaN |
NaN |
NaN |
NaN |
NaN |
Category2 |
Class1 |
NaN |
NaN |
NaN |
NaN |
0.285251 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
Class2 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.704581 |
0.059809 |
Class3 |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
NaN |
0.364886 |
NaN |
NaN |
NaN |
NaN |
In the below cell, we have created an example where we have given two columns ('Category' and 'Type') to be used as columns of the resulting dataframe to pivot_table() method. The 'Item' column values will become an index of our dataframe and values of column 'A' after applying aggregate function will become values of the resulting dataframe. The aggregate function will be applied to values of column 'A' for each combination of values from column 'Item', 'Category', and 'Type'.
|
A |
Category |
Category1 |
Category2 |
Type |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Item |
|
|
|
|
|
|
Item1 |
NaN |
NaN |
0.191519 |
NaN |
NaN |
NaN |
Item10 |
NaN |
NaN |
0.533310 |
NaN |
NaN |
NaN |
Item11 |
0.111894 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item12 |
0.912123 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item13 |
NaN |
NaN |
NaN |
0.285251 |
NaN |
NaN |
Item14 |
0.053874 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item15 |
NaN |
NaN |
0.738523 |
NaN |
NaN |
NaN |
Item2 |
NaN |
0.272593 |
NaN |
NaN |
NaN |
NaN |
Item3 |
NaN |
0.357817 |
NaN |
NaN |
NaN |
NaN |
Item4 |
NaN |
NaN |
0.561196 |
NaN |
NaN |
NaN |
Item5 |
NaN |
NaN |
NaN |
NaN |
NaN |
0.364886 |
Item6 |
0.651378 |
NaN |
NaN |
NaN |
NaN |
NaN |
Item7 |
NaN |
0.869127 |
NaN |
NaN |
NaN |
NaN |
Item8 |
NaN |
NaN |
NaN |
NaN |
0.704581 |
NaN |
Item9 |
NaN |
NaN |
NaN |
NaN |
0.059809 |
NaN |
In the below cell, we have created an example where we have given more than one aggregate function to pivot_table(). We can notice that the resulting dataframe has an entry for each function.
|
mean |
std |
var |
|
A |
A |
A |
Type |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Category |
|
|
|
|
|
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.417968 |
0.322634 |
0.228588 |
0.174697 |
0.104093 |
0.052252 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
NaN |
0.455923 |
NaN |
NaN |
0.207866 |
NaN |
In the below cell, we have created an example where we have given dictionary to aggfunc parameter of pivot_table() function. We have asked it to use 'mean' function on column 'A' values and 'std' function on column 'B' values.
|
A |
B |
Type |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Category |
|
|
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.176636 |
0.115559 |
0.268445 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
NaN |
0.024399 |
NaN |
In the below cell, we have created another example demonstrating how we can provide a dictionary of aggregate functions to aggfunc parameter of pivot_table(). This time we have provided two functions to be used on values of column 'B'.
|
A |
B |
|
mean |
mean |
std |
Type |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Category |
|
|
|
|
|
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.561642 |
0.404544 |
0.438955 |
0.176636 |
0.115559 |
0.268445 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
0.624917 |
0.201540 |
0.615396 |
NaN |
0.024399 |
NaN |
In the below cell, we have recreated an example from the previous cell and provided value for parameter fill_value so that all NaNs will be filled with this value.
|
A |
B |
|
mean |
mean |
std |
Type |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Class1 |
Class2 |
Class3 |
Category |
|
|
|
|
|
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.561642 |
0.404544 |
0.438955 |
0.176636 |
0.115559 |
0.268445 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
0.624917 |
0.201540 |
0.615396 |
0.000001 |
0.024399 |
0.000001 |
In the below cell, we have recreated an example from previous cells with the only difference that we have provided dropna parameter this time. We have asked the resulting dataframe to keep entries where all values are NaNs.
|
A |
B |
|
mean |
std |
mean |
std |
Category |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Category1 |
Category2 |
Type |
|
|
|
|
|
|
|
|
Class1 |
0.432317 |
0.285251 |
NaN |
NaN |
0.561642 |
0.624917 |
0.176636 |
NaN |
Class2 |
0.499846 |
0.382195 |
NaN |
NaN |
0.404544 |
0.201540 |
0.115559 |
0.024399 |
Class3 |
0.506137 |
0.364886 |
NaN |
NaN |
0.438955 |
0.615396 |
0.268445 |
NaN |
In the below cell, we have created another example demonstrating the usage of pivot_table(). This time we have included the usage of parameters margins and margins_name.
|
A |
Type |
Class1 |
Class2 |
Class3 |
Column Types Average |
Category |
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.477578 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
0.353632 |
Column Types Average |
0.402904 |
0.452786 |
0.477887 |
0.444526 |
Category1 Avg - 0.4775777411607928
Category2 Avg - 0.35363186872382657
Class1 Avg - 0.40290399848097636
Class2 Avg - 0.4527855591542317
Class3 Avg - 0.4778869678975973
In the below cell, we have created the last example demonstrating function pivot_table(). We have asked the function to keep entries unsorted by providing sort parameter as False.
|
A |
Type |
Class1 |
Class2 |
Class3 |
Column Types Average |
Category |
|
|
|
|
Category1 |
0.432317 |
0.499846 |
0.506137 |
0.477578 |
Category2 |
0.285251 |
0.382195 |
0.364886 |
0.353632 |
Column Types Average |
0.402904 |
0.452786 |
0.477887 |
0.444526 |
This ends our small tutorial explaining how we can use pandas functions pivot() and pivot_table() to create pivot tables from pandas dataframe. Please feel free to let us know your views in the comments section.
References¶
Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.