Updated On : Nov-22,2021 Time Investment : ~30 mins

How to Speed up Code involving Pandas DataFrame using Numba?¶

Numba is a very commonly used library nowadays to speed up computations in Python code. It let us speed up our code by just decorating them with one of the decorators it provides and then all the speed up will be handled by it without the developers’ need to worry. The majority of the time numba decorated functions work quite faster compared to normal python functions. Numba is designed to speed up even numpy code as well.

Though Numba can speed up numpy code, it does not speed up code involving pandas which is the most commonly used data manipulation library designed on the top of numpy. We have already created a tutorial where we introduced @jit decorator of numba and had discussed that numba can not speed up code involving pandas operations on pandas dataframe.

If you want to check our tutorial on numba @jit decorator then please feel free to check it from the below link.

numba - Make Your Python Code Run Faster Like C/C++

We have created this tutorial to guide developers on how we can use numba to speed up code involving pandas dataframe. As a part of this tutorial, we'll explain with examples various ways to speed up your pandas’ operations. There are basically two ways to speed up pandas operations which we have listed below.

How to Speed up Operations on Pandas DataFrame using Numba?¶

Using 'numba' Engine Available for Selected Pandas Methods - There are selected methods (rolling(), groupby(), etc) in pandas that works on a list of values at a time. These methods let us provide with an argument named engine which if set to 'numba' will speed up operations using Numba behind the scene.
Create Custom Numba Functions to Work with Pandas DataFrame
- We can jit-decorate functions for working with pandas dataframe. We need to design jit-decorated functions in a way that works on numpy arrays or Python lists using loops to speed up the process. We need to retrieve numpy arrays from our pandas dataframes and need to give them as input to our jit-decorated functions for speed up.
- We can also create custom numba functions to replace commonly used operations (like mean(), std(), etc) in pandas. We can use various decorators available from numba to speed up the code of these custom functions. These approaches will show performance improvements when data is large (Generally > 1M entries).

Below we have highlighted important sections of our tutorial to give an overview of the material covered in this tutorial.

Important Sections of Tutorial¶

Using 'numba' Engine Available for Selected Pandas Methods
- Example 1: Trying Various Engines with Pandas Series
- Example 2: Trying Various Engines with Numpy Arrays
- Example 3: Giving Arguments for Numba Engine
- Example 4: Trying Custom Functions
Create Custom Numba Functions to Work with Pandas DataFrame
- Example 1: Decorate Functions with Simply @jit Decorator
- Example 2: Strict nopython Mode (@jit(nopython=True) | @njit)
- Example 3: Provide DataType for Further Speed Up
- Example 4: Introduce Python Loops
- Example 5: Try to Replace Existing Pandas DataFrame Functions with Custom Jit-Decorated Functions
- Example 6: Try to Vectorize Functions using @vectorize Decorator for Further Speed Up

We'll now explain two different ways of speeding up pandas code explained above with simple examples. We have imported the necessary libraries to start with below.

import pandas as pd

print("Pandas Version : {}".format(pd.__version__))

Pandas Version : 1.3.4

import numpy as np

Below we have created a dataframe with random data that we'll be using in our examples. The dataframe has 5 columns with random floats and one column has categorical values.

np.random.seed(123)

data = np.random.rand(int(1e5),5)

df = pd.DataFrame(data=data, columns=list("ABCDE"))
df["Type"] = np.random.choice(["Class1","Class2","Class3","Class4","Class5"], size=(len(df)))

df

	A	B	C	D	E	Type
0	0.696469	0.286139	0.226851	0.551315	0.719469	Class4
1	0.423106	0.980764	0.684830	0.480932	0.392118	Class4
2	0.343178	0.729050	0.438572	0.059678	0.398044	Class4
3	0.737995	0.182492	0.175452	0.531551	0.531828	Class5
4	0.634401	0.849432	0.724455	0.611024	0.722443	Class5
...	...	...	...	...	...	...
99995	0.051910	0.826501	0.897356	0.836202	0.527388	Class4
99996	0.690603	0.520536	0.807082	0.581018	0.398960	Class1
99997	0.296378	0.995768	0.431761	0.877340	0.751442	Class1
99998	0.478550	0.197500	0.079578	0.480642	0.538960	Class2
99999	0.811158	0.100972	0.604496	0.962787	0.588754	Class5

100000 rows × 6 columns

1. Using 'numba' Engine Available for Selected Pandas Methods ¶

In this section, we'll explain pandas dataframe methods which let us use numba for some operations. Pandas generally let us use numba with methods that work on a bunch of values of data like groupby(), rolling(), etc. This methods groups entries of main dataframes and then applies various aggregate functions on these grouped entries. We can inform them to use numba for performing aggregate operations on grouped entries by setting engine argument value to 'numba'.

Below we have first created a rolling dataframe with a window size of 1000. We can call various aggregate functions on this rolled dataframe to find our rolling statistics. The majority of functions which we can call on rolled dataframe accepts engine argument which can be set to 'numba'.

In the next cell below, we have grouped entries of the dataframe based on Type column of data. There are two methods (transform() and agg()/aggregate()) which works on grouped dataframes that accepts engine argument.

We'll be using these rolled and grouped dataframes in our examples.

rolling_df = df.rolling(1000)

rolling_df

Rolling [window=1000,center=False,axis=0,method=single]

grouped_by_types = df.groupby("Type")

grouped_by_types

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fb2e1693ac8>

Example 1: Trying Various Engines with Pandas Series¶

In our first example, we are simply calling mean() function on rolled dataframe to calculate the rolling average on the dataframe. We have called mean() function with various arguments. We have called it without argument, with engine set to 'cython' and with engine set to 'numba'.

The cython is a different implementation of python which is faster compared to normal python implementation.

When we provide engine='numba', the function will use numba to speed up operations behind the scene. It's not guaranteed that using engine='numba' will always improve the performance. We need to test it first to check.

We are using the jupyter notebook magic command %time to measure the time taken by a particular statement. We'll be using it in all our examples to measure the time taken by various statements. If you are interested in learning about various magic commands available with the jupyter notebook then please feel free to check our tutorial on the same which covers the majority of magic commands.

List of Useful Magic Commands in Jupyter Notebook/Lab

%time out = rolling_df.mean()

%time out = rolling_df.mean(engine='cython')

%time out = rolling_df.mean(engine='numba')

%time out = rolling_df.mean(engine='numba')

CPU times: user 16.8 ms, sys: 0 ns, total: 16.8 ms
Wall time: 17.7 ms
CPU times: user 31.9 ms, sys: 0 ns, total: 31.9 ms
Wall time: 31.3 ms
CPU times: user 1.07 s, sys: 0 ns, total: 1.07 s
Wall time: 1.07 s
CPU times: user 1.06 s, sys: 10.7 ms, total: 1.07 s
Wall time: 1.07 s

Example 2: Trying Various Engines with Numpy Arrays¶

In this section, we have again called mean() function on our rolling dataframe just like our previous example but there is one difference. We have provided raw=True to a method that will give a function that calculates average numpy arrays instead of pandas series. If we don't provide raw=True, it'll give values of columns as pandas series. We set raw=True because numba functions do well with function which operates on numpy arrays.

%time out = rolling_df.mean(raw=True)

%time out = rolling_df.mean(engine='cython', raw=True)

%time out = rolling_df.mean(engine='numba', raw=True)

%time out = rolling_df.mean(engine='numba', raw=True)

CPU times: user 42.9 ms, sys: 3.91 ms, total: 46.8 ms
Wall time: 45.2 ms
CPU times: user 21.3 ms, sys: 112 µs, total: 21.4 ms
Wall time: 20.4 ms
CPU times: user 1.27 s, sys: 173 µs, total: 1.27 s
Wall time: 1.27 s
CPU times: user 1.39 s, sys: 54.8 ms, total: 1.44 s
Wall time: 1.54 s

In the below cell, we have called std() function on our rolled dataframe to calculate rolling standard deviation. We have measured the time taken by each call. We can notice that numba seems to be doing better this time but not that much noticeable difference.

%time out = rolling_df.std(raw=True)

%time out = rolling_df.std(engine='cython', raw=True)

%time out = rolling_df.std(engine='numba', raw=True)

%time out = rolling_df.std(engine='numba', raw=True)

CPU times: user 30.8 ms, sys: 0 ns, total: 30.8 ms
Wall time: 35.6 ms
CPU times: user 29.9 ms, sys: 0 ns, total: 29.9 ms
Wall time: 30 ms
CPU times: user 13.6 ms, sys: 0 ns, total: 13.6 ms
Wall time: 13.3 ms
CPU times: user 13.1 ms, sys: 0 ns, total: 13.1 ms
Wall time: 12.7 ms

Example 3: Giving Arguments for Numba Engine¶

The methods which accept engine='numba' argument also let us specify various arguments that we generally provide to numba @jit decorator. The common arguments of numba @jit decorators are nopython, nogil, cache, and parallel.

In the below cell, we have tried to calculate standard deviation on our rolled dataframe by providing different arguments for numba engine. We can notice that the numba engine seems to be doing better this time compared to the normal call.

If you want to know in detail about these numba @jit decorator arguments then please feel free to check our tutorial on it which covers all arguments in detail with examples.

numba - Make Your Python Code Run Faster Like C/C++

%time out = rolling_df.std(raw=True)

%time out = rolling_df.std(engine='cython', raw=True)

%time out = rolling_df.std(engine='numba', nopython=True, raw=True)

%time out = rolling_df.std(engine='numba', nopython=True, cache=True, raw=True)

%time out = rolling_df.std(engine='numba', nopython=True, cache=True, parallel=True, raw=True)

CPU times: user 35.7 ms, sys: 8.97 ms, total: 44.7 ms
Wall time: 132 ms
CPU times: user 20.4 ms, sys: 0 ns, total: 20.4 ms
Wall time: 19.9 ms
CPU times: user 10.9 ms, sys: 1.87 ms, total: 12.8 ms
Wall time: 12.4 ms
CPU times: user 12.5 ms, sys: 0 ns, total: 12.5 ms
Wall time: 12.2 ms
CPU times: user 12.2 ms, sys: 0 ns, total: 12.2 ms
Wall time: 12 ms

Example 4: Using Custom Functions¶

We can also provide custom user-defined functions to perform the different aggregate function which is not available through pandas.

Below we have created a simple custom function that takes as input arrays, squares its values, and then takes the mean of squared values. We'll be giving this function as an aggregate function on rolled dataframe.

def custom_mean(x):
    return (x * x).mean()

In the below cell, we have called apply() function on our rolled dataframe asking it to execute the custom mean function we designed in the previous cell. Like our previous examples, we have tried function without any backend engine, cython engine, and numba engine.

We can notice from the results that numba is doing a little better job compared to other backend engines.

%time out = rolling_df.apply(custom_mean, raw=True)

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

CPU times: user 4.61 s, sys: 37 µs, total: 4.61 s
Wall time: 4.61 s
CPU times: user 4.87 s, sys: 20 ms, total: 4.89 s
Wall time: 4.9 s
CPU times: user 1.29 s, sys: 4 ms, total: 1.29 s
Wall time: 1.29 s
CPU times: user 1.27 s, sys: 4.02 ms, total: 1.27 s
Wall time: 1.27 s

In the below cell, we have created a custom standard deviation function that takes the square of the input array and then calculates the standard deviation of squared values.

We have then tried this function on our rolled dataframe using apply() function. We have also recorded the time taken by each call for comparison purposes.

def custom_std(x):
    return (x * x).std()

%time out = rolling_df.apply(custom_std, raw=True)

%time out = rolling_df.apply(custom_std, engine='cython', raw=True)

%time out = rolling_df.apply(custom_std, engine='numba', raw=True)

%time out = rolling_df.apply(custom_std, engine='numba', raw=True)

CPU times: user 13.2 s, sys: 63.6 ms, total: 13.3 s
Wall time: 13.4 s
CPU times: user 13.1 s, sys: 11.9 ms, total: 13.1 s
Wall time: 13.1 s
CPU times: user 2.52 s, sys: 7.97 ms, total: 2.53 s
Wall time: 2.54 s
CPU times: user 1.8 s, sys: 8.03 ms, total: 1.81 s
Wall time: 1.81 s

In the below cell, we have created a function that takes as input index and an array of values, it then calculates the mean of it.

We'll be using this function on our grouped dataframe to calculate the mean of grouped entries. We'll be comparing the time taken by different engines as usual.

Please make a NOTE that currently only transform(), agg() and aggregate() functions support engine argument which can be set to 'numba'. The agg() and aggregate() methods perform same function.

from numba import jit

def func(values, index):
    return values.mean()

%time out = grouped_by_types.agg('mean')

%time out = grouped_by_types.agg('mean', engine='cython')

%time out = grouped_by_types.agg(func, engine='numba')

%time out = grouped_by_types.agg(func, engine='numba')

CPU times: user 5.21 ms, sys: 200 µs, total: 5.41 ms
Wall time: 26.6 ms
CPU times: user 3.68 ms, sys: 0 ns, total: 3.68 ms
Wall time: 3.79 ms
CPU times: user 312 ms, sys: 3.74 ms, total: 316 ms
Wall time: 315 ms
CPU times: user 6.4 ms, sys: 0 ns, total: 6.4 ms
Wall time: 6.03 ms

2. Create Custom Numba Functions to Work with Pandas DataFrame ¶

In this section, we'll be creating a @jit decorated function to work on our pandas dataframe. We'll compare the performance of these @jit decorated functions with other non-decorated functions. We'll also try to create functions to replace aggregate functions which are already provided by the pandas dataframe. Apart from @jit, we'll also try to use @vectorize decorator to speed up operations.

As we had said earlier, we'll be retrieving numpy arrays from our pandas dataframe before giving them to numba functions because numba works well with numpy arrays and python loops.

Please make a NOTE that the difference in the performance of numba functions might not be visible with small arrays. It becomes visible as array size increases. The numba functions also get compiled the first time we run it hence can take more time when we execute it the first time but all subsequent executions are quite faster as it uses compiled version from memory.

Below we have again created rolled dataframe and grouped dataframe like our previous section. We'll be trying various numba functions on them this time.

rolling_df = df.rolling(1000)

grouped_by_types = df.groupby("Type")

Example 1: Decorate Functions with Simply @jit Decorator¶

As a part of our first example, we have created two functions that perform the same operation on the input array but one of them is decorated with numba @jit decorator. The functions take as input arrays, squares their values, and then calculate the mean of squared values.

Then in the next cell, we have tried these functions on our rolled dataframe using apply() function. We have called apply() more than once with different backend engines (None, cython and numba) like our previous examples. We have also recorded the time taken by various executions.

We can notice from the results that @jit decorated function takes less time compared to the normal non-decorated function.

from numba import jit, njit, vectorize, float64

def custom_mean(x):
    return (x * x).mean()

@jit(cache=True)
def custom_mean_jitted(x):
    return (x * x).mean()

%time out = rolling_df.apply(custom_mean, raw=True)

%time out = rolling_df.apply(custom_mean_jitted, raw=True)

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean_jitted, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

%time out = rolling_df.apply(custom_mean_jitted, engine='numba', raw=True)

CPU times: user 4.33 s, sys: 57.4 ms, total: 4.39 s
Wall time: 4.32 s
CPU times: user 898 ms, sys: 0 ns, total: 898 ms
Wall time: 906 ms
CPU times: user 3.53 s, sys: 1.54 ms, total: 3.53 s
Wall time: 3.53 s
CPU times: user 814 ms, sys: 0 ns, total: 814 ms
Wall time: 813 ms
CPU times: user 1.49 s, sys: 7.15 ms, total: 1.5 s
Wall time: 1.5 s
CPU times: user 1.4 s, sys: 0 ns, total: 1.4 s
Wall time: 1.41 s

Example 2: Strict nopython Mode (@jit(nopython=True) | @njit)¶

Our code for this example is almost exactly the same as our code from the previous example with one minor change. We are using @njit decorator instead of @jit decorator this time. The @njit decorator compiles the code of function in pure nopython mode of numba which is generally faster. We can force nopython mode when using @jit decorator as well by providing nopython=True argument to it.

If you want to know about numba nopython mode then please feel free to check our tutorial that covers it.

We have then executed these functions on our rolled dataframe using apply() function with different backends for comparison purposes.

from numba import jit, njit

def custom_mean(x):
    return (x * x).mean()

@njit(cache=True)
def custom_mean_jittted(x):
    return (x * x).mean()

%time out = rolling_df.apply(custom_mean, raw=True)

%time out = rolling_df.apply(custom_mean_jittted, raw=True)

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean_jittted, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

%time out = rolling_df.apply(custom_mean_jittted, engine='numba', raw=True)

CPU times: user 3.53 s, sys: 22.6 ms, total: 3.56 s
Wall time: 3.54 s
CPU times: user 888 ms, sys: 7.82 ms, total: 896 ms
Wall time: 896 ms
CPU times: user 3.55 s, sys: 28.3 ms, total: 3.58 s
Wall time: 3.53 s
CPU times: user 820 ms, sys: 7.7 ms, total: 827 ms
Wall time: 827 ms
CPU times: user 1.57 s, sys: 15.6 ms, total: 1.58 s
Wall time: 1.58 s
CPU times: user 1.43 s, sys: 0 ns, total: 1.43 s
Wall time: 1.43 s

Example 3: Provide DataType for Further Speed Up¶

We can further speed up our @jit decorated functions by providing input and output data types. The numba will create compiled version based on those datatypes which can improve performance. Below we have provided float64 as the input and output data type of our function.

We have then called these functions on our rolled dataframe using apply() method with different backend engines for comparing performance.

from numba import jit, njit, float64

def custom_mean(x):
    return (x * x).mean()

@jit(float64(float64[:]), nopython=True, cache=True)
def custom_mean_jitted(x):
    return (x * x).mean()

%time out = rolling_df.apply(custom_mean, raw=True)

%time out = rolling_df.apply(custom_mean_jitted, raw=True)

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean_jitted, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

%time out = rolling_df.apply(custom_mean_jitted, engine='numba', raw=True)

CPU times: user 3.57 s, sys: 43.8 ms, total: 3.61 s
Wall time: 3.57 s
CPU times: user 891 ms, sys: 0 ns, total: 891 ms
Wall time: 890 ms
CPU times: user 3.54 s, sys: 7.48 ms, total: 3.54 s
Wall time: 3.54 s
CPU times: user 879 ms, sys: 3.76 ms, total: 883 ms
Wall time: 882 ms
CPU times: user 1.49 s, sys: 3.71 ms, total: 1.5 s
Wall time: 1.49 s
CPU times: user 1.31 s, sys: 0 ns, total: 1.31 s
Wall time: 1.31 s

Example 4: Introduce Python Loops¶

As we know that numba works really well with python loops, we can also modify our function to work with python loops. In this example, we have modified our @jit decorated function to calculate the mean of squared values in the loop.

We have then executed these functions on our rolled dataframe with different backend engines to compare performance. We can notice that it seems to be doing a little better compared to our previous examples.

from numba import jit, njit, vectorize, float64

def custom_mean(x):
    return (x * x).mean()

@jit(float64(float64[:]), nopython=True, cache=True)
def custom_mean_loops_jitted(x):
    out = 0.0
    for i in x:
        out += (i*i)
    return out / len(x)

%time out = rolling_df.apply(custom_mean, raw=True)

%time out = rolling_df.apply(custom_mean_loops_jitted, raw=True)

%time out = rolling_df.apply(custom_mean, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean_loops_jitted, engine='cython', raw=True)

%time out = rolling_df.apply(custom_mean, engine='numba', raw=True)

%time out = rolling_df.apply(custom_mean_loops_jitted, engine='numba', raw=True)

CPU times: user 3.61 s, sys: 11.8 ms, total: 3.62 s
Wall time: 3.6 s
CPU times: user 700 ms, sys: 0 ns, total: 700 ms
Wall time: 699 ms
CPU times: user 3.51 s, sys: 3.73 ms, total: 3.52 s
Wall time: 3.51 s
CPU times: user 689 ms, sys: 0 ns, total: 689 ms
Wall time: 688 ms
CPU times: user 1.57 s, sys: 3.76 ms, total: 1.57 s
Wall time: 1.57 s
CPU times: user 1.01 s, sys: 0 ns, total: 1.01 s
Wall time: 1.01 s

Example 5: Try to Replace Existing Pandas DataFrame Functions with Custom Jit-Decorated Functions¶

In this example, we'll create a custom @jit decorated function to replace an existing mean() function available from the pandas dataframe.

Below we have first calculated the mean of 5 columns of the dataframe using the in-built mean() function and recorded the time taken for the operation.

%time out = df[list("ABCDE")].mean()

CPU times: user 11 ms, sys: 19 µs, total: 11 ms
Wall time: 9.58 ms

In the below cell, we have designed a function that takes as input numpy array and calculates the mean of it. We have @jit decorated the function and also specified input/output data types. We have also provided nopython=True argument to run numba in strict nopython mode.

from numba import jit, njit, vectorize, float64, float32

@jit([float32(float32[:]), float64(float64[:])], nopython=True, cache=True)
def custom_mean(x):
    return x.mean()

In the below cell, we have looped through column names of the pandas’ data frame and calculated the mean of them using our custom mean function. We have also recorded the time taken to calculate the mean of all columns. We can notice that it takes a little less time compared to pandas’ in-built function. We think that this difference will increase with an increase in the size of the array and the number of columns.

Please make a NOTE that difference in performance will be more visible as array size increases and goes beyond 1M values.

%%time

avg_cols = {}
for col in list("ABCDE"):
    avg_cols[col] = custom_mean(df[col].values)

CPU times: user 2.9 ms, sys: 0 ns, total: 2.9 ms
Wall time: 2.94 ms

Example 6: Try to Vectorize Functions using @vectorize Decorator for Further Speed Up¶

In this section, we'll explain another example where we'll use @vectorize decorator to replace the existing function of pandas.

Below we have taken a column of our pandas dataframe, squared its values, and then added scaler value 2 to it. We have performed this operation by providing a simple function to apply() method. We have recorded the time taken by this operation.

In the next cell below, we have calculated the square of the pandas’ column by using a simple multiplication operation. We have recorded the time taken to perform an operation in this way as well.

%time out = df.A.apply(lambda x : x**2 + 2)

CPU times: user 46.9 ms, sys: 3.99 ms, total: 50.9 ms
Wall time: 50.8 ms

%time out = (df.A.values * df.A.values) + 2

CPU times: user 1.09 ms, sys: 3.68 ms, total: 4.77 ms
Wall time: 50.9 ms

In the below cell, we have created a simple function that takes as input a single value, squares it, and adds scalar value 2 to it. We have then vectorized this function using numba @vectorize function. We'll be using this function to perform the same operation we performed using pandas in-built method earlier in previous cells.

If you want to know about how numba @vectorize decorator works then please feel free to check our tutorial on it from the below link.

Numba @vectorize Decorator: Convert Scaler Function to Universal Function (ufunc)

from numba import vectorize, float32, float64

@vectorize([float32(float32), float64(float64)])
def square(x):
    return x**2 + 2

In the below cell, we have called our vectorized function 3 times on values of the column of our dataframe. We have recorded the time taken by this function all 3 times. We can notice that our vectorized function takes quite less time compared to pandas’ in-built functionalities.

Please make a NOTE that difference in performance will be more visible as array size increases and goes beyond 1M values.

%time out = square(df["A"].values)

%time out = square(df["A"].values)

%time out = square(df["A"].values)

CPU times: user 1.69 ms, sys: 39 µs, total: 1.73 ms
Wall time: 846 µs
CPU times: user 619 µs, sys: 0 ns, total: 619 µs
Wall time: 626 µs
CPU times: user 1.38 ms, sys: 0 ns, total: 1.38 ms
Wall time: 696 µs

This ends our small tutorial explaining how we can create code using numba when working with pandas dataframe to speed up code involving dataframes. Please feel free to let us know your views in the comments section.

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

numba, pandas-dataframe

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

How to Speed up Code involving Pandas DataFrame using Numba?¶

How to Speed up Operations on Pandas DataFrame using Numba?¶

Important Sections of Tutorial¶

1. Using 'numba' Engine Available for Selected Pandas Methods ¶

Example 1: Trying Various Engines with Pandas Series¶

Example 2: Trying Various Engines with Numpy Arrays¶

Example 3: Giving Arguments for Numba Engine¶

Example 4: Using Custom Functions¶

2. Create Custom Numba Functions to Work with Pandas DataFrame ¶

Example 1: Decorate Functions with Simply @jit Decorator¶

Example 2: Strict nopython Mode (@jit(nopython=True) | @njit)¶

Example 3: Provide DataType for Further Speed Up¶

Example 4: Introduce Python Loops¶

Example 5: Try to Replace Existing Pandas DataFrame Functions with Custom Jit-Decorated Functions¶

Example 6: Try to Vectorize Functions using @vectorize Decorator for Further Speed Up¶

References¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription