Updated On : Nov-29,2021 Tags pandas-dataframe, python-expressions, query-function
Pandas query(): Query Pandas DataFrame using Python Expressions

Pandas query(): Query Pandas DataFrame using Python Expressions

Python expressions lets us check for conditions and make decisions based on the result. It can let us check for various conditions like greater than, less than, equal to, not equal to, etc. We can check for conditions like the presence of variables in sequences or scalar values in sequences. We can even compare different variables using expressions and make decisions based on that. Python provides constructs like '>,<,>=,<=,=,!=,~, &, |,~' etc for performing various conditions.

Pandas dataframes are the most commonly used data structure to store and manipulate tabular data in Python. What if we can use python expressions to filter rows of pandas dataframe. It can make code quite easy to understand if we can filter rows of pandas dataframe by providing python expressions specifying some conditions to filter rows.

To use python expressions with pandas dataframe, it provides a method named query() which takes as input python expression and filters rows of dataframe based on the condition specified through expression. The expression can use column names of pandas dataframe inside it and compare it against scalar, local variable, etc to make decisions. We need to provide expression as a string to query() method. The query() method used eval() method behind the scene to evaluate Python expressions.

As a part of this tutorial, we'll explain how we can use Python expressions to filter rows of pandas dataframe using query() method. We have created many different examples to explain different conditions covering the majority of scenarios. For a few of our examples, we have also explained the second way of filtering rows without using query() method for comparison purposes.

Below we have imported pandas and numpy libraries that we'll use in our tutorial.

In [1]:
import pandas as pd

print("Pandas Version : {}",format(pd.__version__))
Pandas Version : {} 1.3.4
In [2]:
import numpy as np

In the below cell, we have created a sample dataframe that we'll use to explain query() method. The index of the dataframe is the datetime index, five columns are random floats and one last column has categorical data.

In [3]:
np.random.seed(123)

data = np.random.rand(10,5)

df1 = pd.DataFrame(data=data, index=pd.date_range(start="2021-1-1", periods=10), columns=list("ABCDE"))

df1["Type"] = np.random.choice(["Class1", "Class2", "Class3"], size=10)

df1
Out[3]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 1

Our first example explains how we can include the simple expression to compare the value of a particular column against a scalar value and filter rows based on results. We have called query() method with expression 'A > 0.5' which will check each value of column 'A' and keep only entries which are greater than 5.

In the next cell after the below cell, we have explained how we can perform the same operation without using query() method.

In [4]:
df1.query("A > 0.5")
Out[4]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [5]:
df1[df1["A"] > 0.5]
Out[5]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 2

Below we have created another example explaining the usage of query() method. This time we have checked for condition 'E * 10 > 5' which will first multiply values of column 'E' by scalar 10 and then check which entries are greater than 5.

Then in the next cell after the below cell, we have explained how we can perform the same thing without using query() function.

In [6]:
df1.query("E * 10 > 5")
Out[6]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [7]:
df1[(df1["E"] * 10) > 5]
Out[7]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 3

In this example, we are checking for condition 'C - 0.5 > 0.4' using query() function. We are subtracting scalar value 0.5 from each value of column 'C' and only keeping entries where values after subtraction are greater than 0.4.

Then in the next cell after the below cell, we have included code to do the same operation without using query() method.

In [8]:
df1.query("C-0.5 > 0.4")
Out[8]:
A B C D E Type
2021-01-10 0.250455 0.483034 0.98556 0.519485 0.612895 Class1
In [9]:
df1[(df1["C"] - 0.5) > 0.4]
Out[9]:
A B C D E Type
2021-01-10 0.250455 0.483034 0.98556 0.519485 0.612895 Class1

Example 4

In this example, we are checking for condition 'C + 0.5 > 0.1' using query() function. It'll add scalar value 0.5 to all values of column 'C' and then will keep only values which are greater than or equal to 1.0.

Then in the next cell, we have explained how the same can be done without using query() method.

In [12]:
df1.query("C+0.5 >= 1.0")
Out[12]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [13]:
df1[(df1["C"] + 0.5) >= 1.0]
Out[13]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 5

In our fifth example, we are checking for condition 'D / 10 > 0.09' using query() function. It'll first divide all values of column 'D' by scalar value 10 and then keep entries which are greater than 0.09.

In [153]:
df1.query("D/10 > 0.09")
Out[153]:
A B C D E Type
2021-01-08 0.312261 0.426351 0.893389 0.94416 0.501837 Class2
In [154]:
df1[(df1["D"] / 10) > 0.09]
Out[154]:
A B C D E Type
2021-01-08 0.312261 0.426351 0.893389 0.94416 0.501837 Class2

Example 6

In this example, we are checking for condition 'E**2 > 0.5' using query(). This will first raise values of column 'E' by power of two and then keep entries which are greater than 0.5.

In [155]:
df1.query("E**2 > 0.5")
Out[155]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [156]:
df1[(df1["E"]**2) > 0.5]
Out[156]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 7

In this example, we have included little complicated expression '(E*10)%10 > 7' inside call to query() method. This expression will first multiply values of column 'E' by 10 and then take modulo by 10. The resulting values will be compared against scalar 7 and only values greater than 7 will be kept.

In [157]:
df1.query("(E*10)%10 > 7")
Out[157]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [158]:
df1[(df1["E"]*10)%10 > 7]
Out[158]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 8

Till now, we were comparing values of columns against scalar values but we can compare columns as well using query*() function.

In this example, we are using the expression 'A < B' which will only take entries where values of column 'A' are less than or equal to values of column 'B'.

In [15]:
df1.query("A <= B")
Out[15]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [17]:
df1[df1["A"] <= df1["B"]]
Out[17]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 9

In this example, we are checking for condition Type == 'Class1' condition. This condition will check for string value 'Class1' in column 'Type' and keep only entries where condition is satisfied.

Please take a look at how we have provided string value.

In [161]:
df1.query("Type == 'Class1'")
Out[161]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [162]:
df1[df1["Type"] == "Class1"]
Out[162]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 10

In this example, we are checking for index of the dataframe. We have provided condition index in ['2021-1-1', '2021-1-2']. This will check for entries that have the date in one of these two dates specified as a list and will keep only entries that satisfy this condition.

In [163]:
df1.query("index in ['2021-1-1', '2021-1-2']")
Out[163]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
In [164]:
df1[df1.index.isin(["2021-1-1", "2021-1-2"])]
Out[164]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1

Example 11

In this example, we have explained how we can use outside python variable in python expression provided to query() method. We can refer to local variables by adding string <strong>'@'</strong> in front of them.

We have first created a list with 3 dates using pandas.date_range() function. We are then checking which index entries of the dataframe have one of these three dates. We'll keep all entries that satisfy this condition.

In [208]:
rng = pd.date_range(start="2021-1-1", periods=3)

df1.query("index in @rng")
Out[208]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
In [209]:
rng = pd.date_range(start="2021-1-1", periods=3)

df1[df1.index.isin(rng)]
Out[209]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3

Example 12

Our data frame’s index is of datatype hence we can perform operations that we generally perform with dates on it.

In this example, we are checking for condition index > '2021-1-5'. This will keep entries that have a date ahead of '2021-1-5' in an index of our dataframe.

In [192]:
df1.query("index > '2021-1-5'")
Out[192]:
A B C D E Type
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [194]:
df1[df1.index > "2021-1-5"]
Out[194]:
A B C D E Type
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 13

Here, we have created another example where we are using a local variable in query() method expression. We have created a local variable that has a list of two strings. We then check for the condition which checks which entries of 'Type' column are present in that list of strings and keep only those entries.

In [210]:
sel_classes = ["Class1", "Class3"]

df1.query("Type in @sel_classes")
Out[210]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [211]:
sel_classes = ["Class1", "Class3"]

df1[df1["Type"].isin(sel_classes)]
Out[211]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 14

In this example, we are checking for conditions by performing arithmetic operations between columns of the dataframe. We are checking for condition 'A - B < C' using query() method. This will subtract values of column 'B' from values of column 'A' and then keep only entries that are less than values of column 'C'.

In [165]:
df1.query("A-B < C")
Out[165]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [166]:
df1[(df1["A"] - df1["B"]) < df1["C"]]
Out[166]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 15

In this example, we have again explained how to use a local variable in expression. We have created a variable that has a scalar value of 0.8. We are then checking for a condition that we'll only keep entries where values of column 'E' are greater than or equal to the value of that scalar variable.

In [167]:
threshold = 0.8

df1.query("E >= @threshold")
Out[167]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [168]:
df1[df1["E"] >= threshold]
Out[168]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 16

In this example, we have performed operations between three columns of the dataframe to filter rows. We are checking for condition 'A / B < C'. We are dividing values of column 'A' by values of column 'B' and checking which values are less than values of column 'C' after division. We'll only keep entries that satisfy this condition.

In [169]:
df1.query("A / B < C")
Out[169]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [170]:
df1[(df1["A"] / df1["B"]) < df1["C"]]
Out[170]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 17

In this example, we are explaining how to use column name when column name has spaces in it.

Below we have first created a copy of our dataframe with the name of 'Type' column replaced with 'Type Of Class' which has spaces in it.

In order to use a column name that has spaces in it with query() method, we need to keep the column name in backtick (`). We are checking for entries where 'Type Of Class' column has a value 'Class3'.

In [171]:
df2 = df1.rename(columns={"Type": "Type Of Class"})

df2
Out[171]:
A B C D E Type Of Class
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [172]:
df2.query("`Type Of Class` == 'Class3'")
Out[172]:
A B C D E Type Of Class
2021-01-03 0.343178 0.72905 0.438572 0.059678 0.398044 Class3

Example 18

In this example, we are explaining how we can keep entries based on '=' operator condition. We have first modified a few entries of column 'C and D' of our second dataframe which we had created in the previous cell.

We have then provided condition 'C == D' in query() method to keep only entries where values of these columns are the same.

In [173]:
df2.loc["2021-1-3", "C"] = 5.0
df2.loc["2021-1-3", "D"] = 5.0
df2.loc["2021-1-8", "C"] = 15.0
df2.loc["2021-1-8", "D"] = 15.0

df2
Out[173]:
A B C D E Type Of Class
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 5.000000 5.000000 0.398044 Class3
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 15.000000 15.000000 0.501837 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [174]:
df2.query("C == D")
Out[174]:
A B C D E Type Of Class
2021-01-03 0.343178 0.729050 5.0 5.0 0.398044 Class3
2021-01-08 0.312261 0.426351 15.0 15.0 0.501837 Class2
In [175]:
df2[df2["C"] == df2["D"]]
Out[175]:
A B C D E Type Of Class
2021-01-03 0.343178 0.729050 5.0 5.0 0.398044 Class3
2021-01-08 0.312261 0.426351 15.0 15.0 0.501837 Class2

Example 19

In this example, we are checking for 'not equal to (!=)' condition using query() function. We are keeping only entries where values of column 'C' are not the same as values of column 'D'.

In [176]:
df2.query("C != D")
Out[176]:
A B C D E Type Of Class
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [177]:
df2[df2["C"] != df2["D"]]
Out[177]:
A B C D E Type Of Class
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 20

In this example, we have explained how we can combine more than one conditions using python 'and' construct. We are checking for condition "A > B and Type == 'Class1'" through query() function. This condition will keep entries where values of column 'A' are greater than 'B' and 'Type' column has value 'Class1'.

In [178]:
df1.query("A > B and Type == 'Class1'")
Out[178]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [179]:
df1[(df1["A"] > df1["B"]) & (df1["Type"] == "Class1")]
Out[179]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 21

In this example, we have again combined more than one condition but this time using '&' operator. We have created the same condition as the previous example by replacing 'and' with '&'.

In [180]:
df1.query("A > B & Type == 'Class1'")
Out[180]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 22

In this example, we have combined 3 different conditions using '&' operator. The query() method will only keep entries that satisfy all three conditions together.

In [181]:
df1.query("A > B & Type == 'Class1' & E >= 0.85")
Out[181]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1
In [182]:
df1[(df1["A"] > df1["B"]) & (df1["Type"] == "Class1") & (df1["E"] >= 0.85)]
Out[182]:
A B C D E Type
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 23

In this example, we have again explained how we can combine more than one condition using '&' operator. We are checking for a condition on index of the dataframe. We'll keep entries where index is greater than '2021-1-3' and less than '2021-1-7'.

In [196]:
df1.query("(index > '2021-1-3') & (index < '2021-1-7')")
Out[196]:
A B C D E Type
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1

Example 24

In this example, we have recreated the previous example by replacing '&' operator with 'and' construct.

In [198]:
df1.query("(index > '2021-1-3') and (index < '2021-1-7')")
Out[198]:
A B C D E Type
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
In [197]:
df1[(df1.index > "2021-1-3") & (df1.index < "2021-1-7")]
Out[197]:
A B C D E Type
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1

Example 25

In this example, we have explained how we can combine more than one condition using 'or' python construct. We are checking for condition "C > D or D > E". This will keep entries where either value of column 'C' is greater than column 'D' or values of column 'D' is greater than column 'E'.

In [183]:
df1.query("C > D or D > E")
Out[183]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [184]:
df1[(df1["C"] > df1["D"]) | (df1["D"] > df1["E"])]
Out[184]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 26

We can also use '|' symbol for checking or condition. In this example, we have recreated the previous example by replacing 'or' with '|'.

In [185]:
df1.query("C > D | D > E")
Out[185]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-05 0.634401 0.849432 0.724455 0.611024 0.722443 Class2
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 27

We can use tilde ('~') operator to check for the negative conditions.

In this example, we are checking for condition "~(A > 0.5)". This condition will check for entries where values of column 'A' are Not greater than 0.5.

In [186]:
df1.query("~(A > 0.5)")
Out[186]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1
In [187]:
df1[~(df1["A"]> 0.5)]
Out[187]:
A B C D E Type
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 28

In this example, we have combined tilde (~) operator with '|' operator to check for condition '~(A > 0.5 | B > 0.5)'. This condition will keep entries where values of column 'A' and 'B' are less than or equal to 0.5.

In [188]:
df1.query("~(A > 0.5 | B > 0.5)")
Out[188]:
A B C D E Type
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 29

In this example, we are checking for condition 'A <= 0.5 & B <= 0.5' which will keep entries where values of columns 'A' and 'B' are less than or equal to 0.5.

In [189]:
df1.query("A <= 0.5 & B <= 0.5")
Out[189]:
A B C D E Type
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-08 0.312261 0.426351 0.893389 0.944160 0.501837 Class2
2021-01-10 0.250455 0.483034 0.985560 0.519485 0.612895 Class1

Example 30

In this example, we are checking for condition '~(C > 0.5 & D > 0.5)' using query() method. This will keep only entries where values of column 'C' are less than/equal to 0.5 or values of column 'D' are less than/equal to 0.5.

In [190]:
df1.query("~(C > 0.5 & D > 0.5)")
Out[190]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

Example 31

In this example, we are checking for condition 'C <= 0.5 | D <= 0.5' using query() method. This condition will have exactly same effect as our condition from previous example.

In [191]:
df1.query("C <= 0.5 | D <= 0.5")
Out[191]:
A B C D E Type
2021-01-01 0.696469 0.286139 0.226851 0.551315 0.719469 Class2
2021-01-02 0.423106 0.980764 0.684830 0.480932 0.392118 Class1
2021-01-03 0.343178 0.729050 0.438572 0.059678 0.398044 Class3
2021-01-04 0.737995 0.182492 0.175452 0.531551 0.531828 Class2
2021-01-06 0.322959 0.361789 0.228263 0.293714 0.630976 Class1
2021-01-07 0.092105 0.433701 0.430863 0.493685 0.425830 Class2
2021-01-09 0.623953 0.115618 0.317285 0.414826 0.866309 Class1

This ends our small tutorial explaining how we can use query() method to filter rows of pandas dataframe by providing python expression specifying conditions for filtering rows. Please feel free to let us know your views in the comments section.

References


  Support Us to Make a Difference

Thank You for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the Coffee button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to let us know in the comments section below (Guest Comments are allowed). We appreciate and value your feedbacks.



Sunny Solanki  Sunny Solanki