Pandas is one the most favorite tool of data scientists for tabular data analysis nowadays. It provides a faster and efficient API to work with tabular data. During analysis when we display data frame, some display behaviors like how many rows to display, how many columns to display, the precision of floats in a data frame, column widths, etc are set to default values by Pandas. We might sometimes need to tweak these defaults in some situations according to our needs.
Pandas let us modify these default behavior by providing various methods. It even let us modify this behavior by using options attribute of pandas. As a part of this tutorial, we'll try to explain with simple examples how we can tweak these default behaviors.
Below is a list of topics that we'll be covering as a part of this tutorial.
We'll start by importing the pandas library and creating a dataframe of size 70 rows and 25 columns with random floats in the range 0-1 in it. We'll try to explain list of the below-mentioned options through our examples. Pandas do provide many more options but we have not covered all of them as a part of this tutorial. We have included commonly used options to make the tutorial simple and easy to follow through.
We'll keep on explaining options as we go ahead with using them in different examples.
Please make a note that all these options will only modify the presentation of data frame and not actual contents of data frame.
import pandas as pd
import numpy as np
data = np.random.random((70,25))
df = pd.DataFrame(data=data, columns=["Column%d"%(i+1) for i in range(25)])
df
As a part of our first section, we'll explain how we can retrieve information about a particular option. Pandas provide us with a method named describe_option() which can be used for this purpose.
Below we have used describe_option() method to retrieve description about display.max_columns and display.precision options respectively. We can notice that in the case of display.precision, we have only provided a partial name, and still it was able to retrieve details about it.
display.max_columns - This options accepts integer value. The value provided in this parameter will decide how many columns to display in the jupyter notebook. If the data frame has more columns than this number then they will be collapsed and '...' pattern will be displayed in the center of the data frame. The default value of this option is 20 columns. We can notice from the above presentation of the data frame that only columns 1-10 and 16-25 are displayed in it. If we set this option to None then no limit will be applied and all columns will be displayed.
display.precision - This options accepts integer value. This value will be used to decide how many digits to display after floating points. The default value for this option is 6.
We'll explain in the set options section how to modify these options.
pd.describe_option("display.max_columns")
pd.describe_option("precision")
As a part of this section, we'll explain how we can retrieve the existing value of a particular option. There are two ways to do this.
We have explained below with simple examples how we can provide full option name as well as partial option name and it'll still return the current value.
print("How many columns to display by default? : {}".format(pd.get_option("display.max_columns")))
print("How many rows to display by default? : {}".format(pd.get_option("display.max_rows")))
print()
print("How many columns to display by default? : {}".format(pd.get_option("max_colu")))
print("How many rows to display by default? : {}".format(pd.get_option("display.max_ro")))
We can notice from the data frame display earlier that as it has 70 rows which are more than 60 rows value of max_rows option, it truncates rows. It only displays 10 rows which is the current value of min_rows option.
Below we have explained how we can retrieve values of options using options attribute of the pandas module.
print("How many columns to display by default? : {}".format(pd.options.display.max_columns))
print("How many rows to display by default? : {}".format(pd.options.display.max_rows))
As a part of this section, we'll explain with simple examples how we can set the values of options to modify the default behavior. There are two ways to do this.
Below we have modified maximum columns display option and set it to 25 columns using set_option() method. We can notice from dataframe representation that it now displays all columns of our data frame. We have modified the value from the default of 20 to 25.
pd.set_option("display.max_columns", 25)
df
Below we have again reset maximum columns to 20 but this time through options attribute of pandas module.
pd.options.display.max_columns = 20
df
Below we have modified the maximum rows to 80 which is more than the number of rows of our dataframe. We can notice that pandas is now displaying all rows of the data frame.
pd.set_option("display.max_rows", 80)
df
Below we have modified max rows and min rows options to modify the behavior of a number of rows getting display again. We have set max_rows to 50 which is less than our data frame rows of 70 hence rows will be truncated. We have set min_rows to 6 which will inform pandas to display 6 rows after truncation.
pd.set_option("max_rows", 50)
pd.set_option("min_rows", 6)
df
Below we are trying to display the first 50 rows of the data frame and it works fine because max_rows is set to 50.
df.head(50)
Below we have modified floating-point precision to 2 digits after the decimal point. We can notice from the presentation below that now it only shows 2 digits after the decimal point.
pd.set_option("precision", 2)
df
As a part of this section, we have explained how to modify the presentation of the data frame if the number of rows and columns both exceeds set option values. We'll be using large_repr option for it.
Below we have explained the usage of large_repr option. We have first set max_rows and max_columns to particular values so that the data frame exceeds them and the value of option large_repr will be used to determine representation.
pd.set_option("max_rows", 50)
pd.set_option("max_columns", 20)
pd.set_option("large_repr", "info")
df
As a part of this section, we'll explain how we can limit how many columns details to display when using info() function.
Below we have called info() function to display information with default value of max_info_columns option.
df.info()
Below we have modified max_info_columns to 20 which is less than the number of columns of the data frame hence the presentation of info() method will truncate information about individual columns.
pd.set_option("display.max_info_columns", 20)
df.info()
We'll now explain how we can include/exclude details about Null when info() method is called using max_info_rows option.
Below we have set max_info_rows to count 100 first which is more than our data frame rows count of 70. Hence it includes information about nulls when info() method is called.
pd.set_option("display.max_info_columns", 25)
pd.set_option("max_info_rows", 100)
df.info()
Below we have set max_info_rows count to 50 which is less than our data frame rows hence null count information is excluded from representation created by info() method.
pd.set_option("max_info_rows", 50)
df.info()
As a part of this section, we'll explain how we can modify columns to truncate extra characters if crosses a certain limit using display.max_colwidth option.
Below we have first reset large_repr option to truncate so that it displays truncated data frames and not info representation. We have then set the value of max_colwidth option to 10 which will make cell values be truncated which has more than 10 characters.
We have created a new data frame of strings for explanation purposes
pd.set_option("large_repr", "truncate")
pd.set_option("precision", 6)
pd.set_option("max_colwidth", 10)
data = [["RandomValue1", "RandomValue2", "RandomValue3"],
["RandomValue4", "RandomValue5", "RandomValue6"],
["RandomValue7", "RandomValue8", "RandomValue9"],
["RandomValue10", "RandomValue11", "RandomValue12"]]
new_df = pd.DataFrame(data, columns=["Column%d"%(i+1) for i in range(3)])
new_df
As a part of this section, we'll explain how we can remove values below a particular threshold float value from the presentation using chop_threshold option.
Below we have set chop_threshold to 0.5 which will set all values 0.5 in data frame presentation to 0.
pd.set_option("large_repr", "truncate")
pd.set_option("chop_threshold", 0.5)
df
As a part of this section, we'll explain how we can include/exclude memory usage information from presentations created by info() method using memory_usage option.
Below we have first set memory_usage to False to exclude it from the presentation and then True again to include it.
pd.set_option("max_info_rows", 1690785)
pd.set_option("max_info_columns", 100)
pd.set_option("memory_usage", False)
df.info()
pd.set_option("memory_usage", True)
df.info()
As a part of this section, we'll explain how we can include/exclude data frame dimension details from a presentation created by info() method using show_dimensions option.
pd.set_option("show_dimensions", False)
df
As a part of this section, we'll explain how we can justify column headers in the data frame using colheader_justify option.
Below we have explained the usage of the option.
pd.set_option("chop_threshold", None)
pd.set_option("colheader_justify", "right")
data=np.random.random((5,5))
new_df = pd.DataFrame(data, columns=["A","B", "C", "D", "E"])
display(new_df)
As a part of this section, we'll explain how we can reset the option to default value using reset_option() method.
Below we have modified the default value of option max_columns and then reset it back to the default value.
print("Display Max Column Default Value : {}".format(pd.get_option("display.max_columns")))
pd.options.display.max_columns = 25
print("Display Max Column New Value : {}".format(pd.get_option("display.max_columns")))
pd.reset_option("display.max_columns")
print("Display Max Column Reset Value : {}".format(pd.get_option("display.max_columns")))
There can be situations then we want to modify options only for a particular section of our code rather than making global changes. We can do that by using option_context() method of pandas as a context manager (with statement). It let us modify options for a particular section of our code and then resets options back to default values.
Below we have set max_columns and max_rows option for particular section of our code. When we display the dataframe in that section, it displays all columns because we have modified max columns to 25 for that section of code. The values of options outside that section will be default or whatever was set outside of it.
print("Max Columns Outside Context : {}".format(pd.options.display.max_columns))
print("Max Rows Outside Context : {}".format(pd.options.display.max_rows))
with pd.option_context("display.max_columns", 25, "display.max_rows", 50):
print("Max Columns Inside Context : {}".format(pd.options.display.max_columns))
print("Max Rows Inside Context : {}".format(pd.options.display.max_rows))
display(df)
This ends our small tutorial explaining how we can work with pandas options. Please feel free to let us know your views in the comments section.
If you want to