Updated On : Nov-29,2021 Time Investment : ~30 mins

Pandas query(): Query Pandas DataFrame using Python Expressions¶

Python expressions lets us check for conditions and make decisions based on the result. It can let us check for various conditions like greater than, less than, equal to, not equal to, etc. We can check for conditions like the presence of variables in sequences or scalar values in sequences. We can even compare different variables using expressions and make decisions based on that. Python provides constructs like '>,<,>=,<=,=,!=,~, &, |,~' etc for performing various conditions.

Pandas dataframes are the most commonly used data structure to store and manipulate tabular data in Python. What if we can use python expressions to filter rows of pandas dataframe. It can make code quite easy to understand if we can filter rows of pandas dataframe by providing python expressions specifying some conditions to filter rows.

To use python expressions with pandas dataframe, it provides a method named query() which takes as input python expression and filters rows of dataframe based on the condition specified through expression. The expression can use column names of pandas dataframe inside it and compare it against scalar, local variable, etc to make decisions. We need to provide expression as a string to query() method. The query() method used eval() method behind the scene to evaluate Python expressions.

As a part of this tutorial, we'll explain how we can use Python expressions to filter rows of pandas dataframe using query() method. We have created many different examples to explain different conditions covering the majority of scenarios. For a few of our examples, we have also explained the second way of filtering rows without using query() method for comparison purposes.

Below we have imported pandas and numpy libraries that we'll use in our tutorial.

import pandas as pd

print("Pandas Version : {}",format(pd.__version__))

Pandas Version : {} 1.3.4

import numpy as np

In the below cell, we have created a sample dataframe that we'll use to explain query() method. The index of the dataframe is the datetime index, five columns are random floats and one last column has categorical data.

np.random.seed(123)

data = np.random.rand(10,5)

df1 = pd.DataFrame(data=data, index=pd.date_range(start="2021-1-1", periods=10), columns=list("ABCDE"))

df1["Type"] = np.random.choice(["Class1", "Class2", "Class3"], size=10)

df1

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 1¶

Our first example explains how we can include the simple expression to compare the value of a particular column against a scalar value and filter rows based on results. We have called query() method with expression 'A > 0.5' which will check each value of column 'A' and keep only entries which are greater than 5.

In the next cell after the below cell, we have explained how we can perform the same operation without using query() method.

df1.query("A > 0.5")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[df1["A"] > 0.5]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 2¶

Below we have created another example explaining the usage of query() method. This time we have checked for condition 'E * 10 > 5' which will first multiply values of column 'E' by scalar 10 and then check which entries are greater than 5.

Then in the next cell after the below cell, we have explained how we can perform the same thing without using query() function.

df1.query("E * 10 > 5")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[(df1["E"] * 10) > 5]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 3¶

In this example, we are checking for condition 'C - 0.5 > 0.4' using query() function. We are subtracting scalar value 0.5 from each value of column 'C' and only keeping entries where values after subtraction are greater than 0.4.

Then in the next cell after the below cell, we have included code to do the same operation without using query() method.

df1.query("C-0.5 > 0.4")

	A	B	C	D	E	Type
2021-01-10	0.250455	0.483034	0.98556	0.519485	0.612895	Class1

df1[(df1["C"] - 0.5) > 0.4]

	A	B	C	D	E	Type
2021-01-10	0.250455	0.483034	0.98556	0.519485	0.612895	Class1

Example 4¶

In this example, we are checking for condition 'C + 0.5 > 0.1' using query() function. It'll add scalar value 0.5 to all values of column 'C' and then will keep only values which are greater than or equal to 1.0.

Then in the next cell, we have explained how the same can be done without using query() method.

df1.query("C+0.5 >= 1.0")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[(df1["C"] + 0.5) >= 1.0]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 5¶

In our fifth example, we are checking for condition 'D / 10 > 0.09' using query() function. It'll first divide all values of column 'D' by scalar value 10 and then keep entries which are greater than 0.09.

df1.query("D/10 > 0.09")

	A	B	C	D	E	Type
2021-01-08	0.312261	0.426351	0.893389	0.94416	0.501837	Class2

df1[(df1["D"] / 10) > 0.09]

	A	B	C	D	E	Type
2021-01-08	0.312261	0.426351	0.893389	0.94416	0.501837	Class2

Example 6¶

In this example, we are checking for condition 'E**2 > 0.5' using query(). This will first raise values of column 'E' by power of two and then keep entries which are greater than 0.5.

df1.query("E**2 > 0.5")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[(df1["E"]**2) > 0.5]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 7¶

In this example, we have included little complicated expression '(E*10)%10 > 7' inside call to query() method. This expression will first multiply values of column 'E' by 10 and then take modulo by 10. The resulting values will be compared against scalar 7 and only values greater than 7 will be kept.

df1.query("(E*10)%10 > 7")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[(df1["E"]*10)%10 > 7]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 8¶

Till now, we were comparing values of columns against scalar values but we can compare columns as well using query*() function.

In this example, we are using the expression 'A < B' which will only take entries where values of column 'A' are less than or equal to values of column 'B'.

df1.query("A <= B")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[df1["A"] <= df1["B"]]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 9¶

In this example, we are checking for condition Type == 'Class1' condition. This condition will check for string value 'Class1' in column 'Type' and keep only entries where condition is satisfied.

Please take a look at how we have provided string value.

df1.query("Type == 'Class1'")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[df1["Type"] == "Class1"]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 10¶

In this example, we are checking for index of the dataframe. We have provided condition index in ['2021-1-1', '2021-1-2']. This will check for entries that have the date in one of these two dates specified as a list and will keep only entries that satisfy this condition.

df1.query("index in ['2021-1-1', '2021-1-2']")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1

df1[df1.index.isin(["2021-1-1", "2021-1-2"])]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1

Example 11¶

In this example, we have explained how we can use outside python variable in python expression provided to query() method. We can refer to local variables by adding string <strong>'@'</strong> in front of them.

We have first created a list with 3 dates using pandas.date_range() function. We are then checking which index entries of the dataframe have one of these three dates. We'll keep all entries that satisfy this condition.

rng = pd.date_range(start="2021-1-1", periods=3)

df1.query("index in @rng")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3

rng = pd.date_range(start="2021-1-1", periods=3)

df1[df1.index.isin(rng)]

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3

Example 12¶

Our data frame’s index is of datatype hence we can perform operations that we generally perform with dates on it.

In this example, we are checking for condition index > '2021-1-5'. This will keep entries that have a date ahead of '2021-1-5' in an index of our dataframe.

df1.query("index > '2021-1-5'")

	A	B	C	D	E	Type
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[df1.index > "2021-1-5"]

	A	B	C	D	E	Type
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 13¶

Here, we have created another example where we are using a local variable in query() method expression. We have created a local variable that has a list of two strings. We then check for the condition which checks which entries of 'Type' column are present in that list of strings and keep only those entries.

sel_classes = ["Class1", "Class3"]

df1.query("Type in @sel_classes")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

sel_classes = ["Class1", "Class3"]

df1[df1["Type"].isin(sel_classes)]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 14¶

In this example, we are checking for conditions by performing arithmetic operations between columns of the dataframe. We are checking for condition 'A - B < C' using query() method. This will subtract values of column 'B' from values of column 'A' and then keep only entries that are less than values of column 'C'.

df1.query("A-B < C")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[(df1["A"] - df1["B"]) < df1["C"]]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 15¶

In this example, we have again explained how to use a local variable in expression. We have created a variable that has a scalar value of 0.8. We are then checking for a condition that we'll only keep entries where values of column 'E' are greater than or equal to the value of that scalar variable.

threshold = 0.8

df1.query("E >= @threshold")

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[df1["E"] >= threshold]

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 16¶

In this example, we have performed operations between three columns of the dataframe to filter rows. We are checking for condition 'A / B < C'. We are dividing values of column 'A' by values of column 'B' and checking which values are less than values of column 'C' after division. We'll only keep entries that satisfy this condition.

df1.query("A / B < C")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[(df1["A"] / df1["B"]) < df1["C"]]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 17¶

In this example, we are explaining how to use column name when column name has spaces in it.

Below we have first created a copy of our dataframe with the name of 'Type' column replaced with 'Type Of Class' which has spaces in it.

In order to use a column name that has spaces in it with query() method, we need to keep the column name in backtick (`). We are checking for entries where 'Type Of Class' column has a value 'Class3'.

df2 = df1.rename(columns={"Type": "Type Of Class"})

df2

	A	B	C	D	E	Type Of Class
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df2.query("`Type Of Class` == 'Class3'")

	A	B	C	D	E	Type Of Class
2021-01-03	0.343178	0.72905	0.438572	0.059678	0.398044	Class3

Example 18¶

In this example, we are explaining how we can keep entries based on '=' operator condition. We have first modified a few entries of column 'C and D' of our second dataframe which we had created in the previous cell.

We have then provided condition 'C == D' in query() method to keep only entries where values of these columns are the same.

df2.loc["2021-1-3", "C"] = 5.0
df2.loc["2021-1-3", "D"] = 5.0
df2.loc["2021-1-8", "C"] = 15.0
df2.loc["2021-1-8", "D"] = 15.0

df2

	A	B	C	D	E	Type Of Class
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	5.000000	5.000000	0.398044	Class3
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	15.000000	15.000000	0.501837	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df2.query("C == D")

	A	B	C	D	E	Type Of Class
2021-01-03	0.343178	0.729050	5.0	5.0	0.398044	Class3
2021-01-08	0.312261	0.426351	15.0	15.0	0.501837	Class2

df2[df2["C"] == df2["D"]]

	A	B	C	D	E	Type Of Class
2021-01-03	0.343178	0.729050	5.0	5.0	0.398044	Class3
2021-01-08	0.312261	0.426351	15.0	15.0	0.501837	Class2

Example 19¶

In this example, we are checking for 'not equal to (!=)' condition using query() function. We are keeping only entries where values of column 'C' are not the same as values of column 'D'.

df2.query("C != D")

	A	B	C	D	E	Type Of Class
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df2[df2["C"] != df2["D"]]

	A	B	C	D	E	Type Of Class
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 20¶

In this example, we have explained how we can combine more than one conditions using python 'and' construct. We are checking for condition "A > B and Type == 'Class1'" through query() function. This condition will keep entries where values of column 'A' are greater than 'B' and 'Type' column has value 'Class1'.

df1.query("A > B and Type == 'Class1'")

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[(df1["A"] > df1["B"]) & (df1["Type"] == "Class1")]

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 21¶

In this example, we have again combined more than one condition but this time using '&' operator. We have created the same condition as the previous example by replacing 'and' with '&'.

df1.query("A > B & Type == 'Class1'")

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 22¶

In this example, we have combined 3 different conditions using '&' operator. The query() method will only keep entries that satisfy all three conditions together.

df1.query("A > B & Type == 'Class1' & E >= 0.85")

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

df1[(df1["A"] > df1["B"]) & (df1["Type"] == "Class1") & (df1["E"] >= 0.85)]

	A	B	C	D	E	Type
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 23¶

In this example, we have again explained how we can combine more than one condition using '&' operator. We are checking for a condition on index of the dataframe. We'll keep entries where index is greater than '2021-1-3' and less than '2021-1-7'.

df1.query("(index > '2021-1-3') & (index < '2021-1-7')")

	A	B	C	D	E	Type
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1

Example 24¶

In this example, we have recreated the previous example by replacing '&' operator with 'and' construct.

df1.query("(index > '2021-1-3') and (index < '2021-1-7')")

	A	B	C	D	E	Type
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1

df1[(df1.index > "2021-1-3") & (df1.index < "2021-1-7")]

	A	B	C	D	E	Type
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1

Example 25¶

In this example, we have explained how we can combine more than one condition using 'or' python construct. We are checking for condition "C > D or D > E". This will keep entries where either value of column 'C' is greater than column 'D' or values of column 'D' is greater than column 'E'.

df1.query("C > D or D > E")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[(df1["C"] > df1["D"]) | (df1["D"] > df1["E"])]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 26¶

We can also use '|' symbol for checking or condition. In this example, we have recreated the previous example by replacing 'or' with '|'.

df1.query("C > D | D > E")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-05	0.634401	0.849432	0.724455	0.611024	0.722443	Class2
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 27¶

We can use tilde ('~') operator to check for the negative conditions.

In this example, we are checking for condition "~(A > 0.5)". This condition will check for entries where values of column 'A' are Not greater than 0.5.

df1.query("~(A > 0.5)")

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

df1[~(df1["A"]> 0.5)]

	A	B	C	D	E	Type
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 28¶

In this example, we have combined tilde (~) operator with '|' operator to check for condition '~(A > 0.5 | B > 0.5)'. This condition will keep entries where values of column 'A' and 'B' are less than or equal to 0.5.

df1.query("~(A > 0.5 | B > 0.5)")

	A	B	C	D	E	Type
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 29¶

In this example, we are checking for condition 'A <= 0.5 & B <= 0.5' which will keep entries where values of columns 'A' and 'B' are less than or equal to 0.5.

df1.query("A <= 0.5 & B <= 0.5")

	A	B	C	D	E	Type
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-08	0.312261	0.426351	0.893389	0.944160	0.501837	Class2
2021-01-10	0.250455	0.483034	0.985560	0.519485	0.612895	Class1

Example 30¶

In this example, we are checking for condition '~(C > 0.5 & D > 0.5)' using query() method. This will keep only entries where values of column 'C' are less than/equal to 0.5 or values of column 'D' are less than/equal to 0.5.

df1.query("~(C > 0.5 & D > 0.5)")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

Example 31¶

In this example, we are checking for condition 'C <= 0.5 | D <= 0.5' using query() method. This condition will have exactly same effect as our condition from previous example.

df1.query("C <= 0.5 | D <= 0.5")

	A	B	C	D	E	Type
2021-01-01	0.696469	0.286139	0.226851	0.551315	0.719469	Class2
2021-01-02	0.423106	0.980764	0.684830	0.480932	0.392118	Class1
2021-01-03	0.343178	0.729050	0.438572	0.059678	0.398044	Class3
2021-01-04	0.737995	0.182492	0.175452	0.531551	0.531828	Class2
2021-01-06	0.322959	0.361789	0.228263	0.293714	0.630976	Class1
2021-01-07	0.092105	0.433701	0.430863	0.493685	0.425830	Class2
2021-01-09	0.623953	0.115618	0.317285	0.414826	0.866309	Class1

This ends our small tutorial explaining how we can use query() method to filter rows of pandas dataframe by providing python expression specifying conditions for filtering rows. Please feel free to let us know your views in the comments section.

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

pandas-dataframe, python-expressions, query-function

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

Pandas query(): Query Pandas DataFrame using Python Expressions¶

Example 1¶

Example 2¶

Example 3¶

Example 4¶

Example 5¶

Example 6¶

Example 7¶

Example 8¶

Example 9¶

Example 10¶

Example 11¶

Example 12¶

Example 13¶

Example 14¶

Example 15¶

Example 16¶

Example 17¶

Example 18¶

Example 19¶

Example 20¶

Example 21¶

Example 22¶

Example 23¶

Example 24¶

Example 25¶

Example 26¶

Example 27¶

Example 28¶

Example 29¶

Example 30¶

Example 31¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription