Python is an interpreter-based language hence it's slow compared to other compiler-based languages like C/C++. Due to this python was not used in any performance-intensive application. To solve this problem, a python library named **Numba** was developed. **Numba** is generally referred to as **Just-In-Time (JIT)** compiler of python code which can speed some parts of or all of the python code by converting it to low-level machine instructions. It uses **LLVM library** for converting python code to machine instructions. We can rerun code compiled with **Numba** and it'll run almost faster like C/C++ language code.

The process of using **Numba** to speed up code is quite simple. **Numba** provides us with a list of decorators that we can use to decorate our functions and it'll compile them when we call the function the first time. Each subsequent call will be using that compiled version hence will be faster. When a function decorated with **Numba** decorators is called, it'll be compiled first to generate faster machine code hence it'll take a little more time. Once the code is compiled then recalling such function will be way faster because the compiled version will be called subsequently.

**Numba** reads python bytecode of function covered with **Numba** decorator, converts its input arguments and other data used inside the function to **Numba** datatypes, optimizes various parts, and converts it to machine code using **LLVM library**. If a function is designed to use with various data types (generic functions) then **Numba** will take time to compile the function each time function is called with a new data type it hasn't seen before. Because it'll be creating a different compilation of the same generic function with different datatypes.

Please make a **NOTE** that **Numba** can only translate a certain subset of python code which involves **loops** and **code involving numpy** to faster machine code. Not everything will be running faster using **Numba**. One needs to have basic knowledge of what can be parallelized and what not to make efficient use of **Numba**. We'll help understand how to use **Numba** better in various situations in this tutorial.

As a part of this tutorial, we'll be covering how we can speed up our python functions using **Numba**. We'll be explaining **@jit** and **@njit** decorators available from **Numba**. Below we have highlighted important sections of the tutorial to give an overview of the material that we'll be covering in this tutorial.

- Example 1: Introducing @jit (Object Mode)
- Example 2 @jit & @njit (Strict nopython Mode)
- Example 3: Specify DataType in Signatures
- Example 4: Cache Compiled Code to Speed Up Frequent Runs
- Example 5: Parallelize Code for Multi-Core CPU (Uses Multi-Threading to Parallelize)
- Example 6: "fastmath" for Faster Mathematical Operations
- Example 7: Release Python GIL during Multi-Threading on Multi-Core CPU
- Example 8: Numba does not Improve Pandas Code

We can easily install **Numba** using pip or conda.

**pip install numba****conda install numba**

Below we have imported **Numba** and printed the version of it that we have used in this tutorial.

In [1]:

```
import numba
print("Numba Version : {}".format(numba.__version__))
```

In [3]:

```
import numpy as np
import pandas as pd
```

In this example, we'll be introducing the first decorator available from **Numba** named **@jit** to speed up our python function. We can decorate any python function with **@jit** decorator and it should speed up the python function.

The **@jit** decorator will compile a python code of function decorated with it. The **@jit** decorator generally works in one of the below-mentioned two modes.

**object mode**- The**@jit**decorator generally tries to convert the whole python function to low-level machine code but if it fails to convert the whole code then it still converts some parts of the function involving loops and other constructs which it can convert to low-level machine code. By default**@jit**decorator works in object mode where it converts some parts of the code or whole code to low-level machine code.**nopython mode**- The**@jit**decorator can be used in another**nopython**mode will strictly try to convert total code of python function to low-level machine code. We can set**@jit**decorator to work in**nopython**mode by setting**nopython**argument inside it to**True**. If the whole function can not be compiled to low-level code then compilation will fail with an error. This mode is generally preferred over**object mode**and way faster than it if it can be used.

Users can test whether their function can run with **nopython mode** first and if it works then use that mode otherwise fall back to **object mode**. If you know that your whole function can be converted using **Numba** then it's preferred to use **nopython mode**. If your function is designed in a way that some parts of it can be converted to **Numba** and some will run in pure python then it's preferred to run in **object mode**.

When testing **Numba @jit** decorator, if it does not seem to improve performance then it's better to remove **@jit** decorator and fall back to using pure python and find out some other ways to improve performance. Because using **@jit** decorator with functions that can't be converted to **Numba** might result in worsening performance as it'll take time to compile function the first time and will result in no performance improvement. Hence time taken first time to compile functions to convert will add up overhead.

Please make a **NOTE** that **Numba** generally does not speed up code involving list-comprehensions and it is suggested to fall back and convert function using comprehensions to loop-based again for faster performance.

In this section, we have created two small examples to explain the usage of **@jit** decorator in object mode.

In our first example, we have created a simple function that takes as input an array of arbitrary size and performs a cube formula on each individual element of the array. The **perform_operation()** function takes as input an array and executes **cube_formula()** function on each element of the array recording their results.

In [172]:

```
def cube_formula(x):
return x**3 + 3*x**2 + 3
def perform_operation(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

After defining the functions, we have executed our main function with two different arrays of numbers where the first one consists of **1M** numbers and the second one consists of **10M** numbers. We have also recorded the time taken by functions as we'll be comparing it against **@jit** decorated functions. We have used the jupyter magic command **%time** to measure the execution time of a particular statement.

Please make a **NOTE** that speed up provided by **Numba @jit** decorated functions will be different on different computers as it's based on low-level machine instructions available to **LLVM Compiler** on the particular computer which can differ from computer to computer.

If you are interested in learning about cell magic commands (like **%time** which we have used in this tutorial) available in jupyter notebook then please feel free to check our tutorial on the same. It covers the majority of jupyter notebook magic commands.

In [178]:

```
%time out = perform_operation(np.arange(1e6))
```

In [179]:

```
%time out = perform_operation(np.arange(1e7))
```

Below we have re-defined both of our functions again but this time decorated them again with **@jit** decorator.

In [180]:

```
from numba import jit
@jit
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have executed the jit-decorated function with two different arrays of different sizes. We have used the same arrays which we had used when testing function normally.

We can notice from the results of time taken by both functions that it takes literally a lot less compared to what it used to take without **@jit**. The **@jit** decorator has improved the performance by quite a big margin.

In [181]:

```
%time out = perform_operation_jitted(np.arange(1e6))
```

In [182]:

```
%time out = perform_operation_jitted(np.arange(1e7))
```

In this section, we have defined one more function to explain the usage of **@jit** decorator. We have simply created a function that simply executes loop inside of loop and records indices of all combinations. The first loop executes **10000** times and the inside loop executes **1000** times.

After defining the function, we have executed it 3 times and recorded the time taken by it each time for comparison purposes later.

In [305]:

```
def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
return perms
```

In [306]:

```
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
```

Now, we have again defined our function but this time decorated it with **@jit** decorator. We have then rerun this **@jit** decorated function 3 times to record the time taken by it. We can notice from the results that it takes quite less time compared to normal function. Also, subsequent calls to **@jit** decorated function take less time because it uses an already compiled version.

In [307]:

```
@jit
def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
return perms
```

In [308]:

```
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
```

In this section, we have run our examples in **nopython** mode of **Numba @jit** decorator. There are two ways in which we can force **nopython** mode.

**@jit(nopython=True)****@njit**

We'll be using both in our examples.

In this section, we have redefined our functions again and decorated them with **@jit** decorators. But this time, we have set **nopython** argument of **@jit** decorator to **True** which is **False** by default. This will force **Numba** to run in strict **nopython** mode and convert all the code of the function to low-level machine code. This mode is generally preferred as it works fast compared to **object mode**.

Our current functions are designed in a way that they can be totally converted to low-level machine code using **Numba**.

If you use **@jit** decorator in **nopython** mode then **Numba** will try to compile your function immediately and if it could not convert some parts then it'll fail with an error. If your function fails to compile in **nopython** mode then it’s advisable to either use **object mode** or divide functions into more functions and use **nopython** mode whenever possible on sub-parts.

In [183]:

```
from numba import jit
@jit(nopython=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit(nopython=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have executed our function two times, ones with an array of size **1M** and one's with an array of size **10**. We have also recorded the time taken by both. We can notice from the results that it seems to be taking the almost same time as our previous **object** mode runs. Though it does not seem to improve the performance of these functions much further, it’s generally the preferred mode to use whenever possible as it can speed up code more.

In [184]:

```
%time out = perform_operation_jitted(np.arange(1e6))
```

In [185]:

```
%time out = perform_operation_jitted(np.arange(1e7))
```

Below we have introduced another way of using **nopython** mode by decorating our functions with **@njit** decorator. We have then also run our **@njit** decorated function two times, ones with an array of size **1M** and ones with an array of size **10M**. We can notice from the results that the time taken is almost the same as using **nopython=True** inside of **@jit** decorator.

In [186]:

```
from numba import njit
@njit
def cube_formula(x):
return x**3 + 3*x**2 + 3
@njit
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

In [187]:

```
%time out = perform_operation_jitted(np.arange(1e6))
```

In [188]:

```
%time out = perform_operation_jitted(np.arange(1e7))
```

In this section, we have **@njit** decorated our second example which we had run in **object mode** explanation section earlier. We have then executed the function three times to check performance. We can notice from the results that the time taken is almost the same or a little better compared to **object mode** runs.

In [313]:

```
@njit
def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
```

In [314]:

```
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
```

When **Numba** compiles the code, it internally creates a version for each different data type with which a function is run. Each time a **@jit** decorated function is run with a new data type, **Numba** needs to compile the function first with this new data type and create a new data type version for future use. All subsequent calls for this recorded data type will be faster.

We can also separately specify input and output data types of our **@jit** decorated function. This will create a compiled version for the specified data type when the function is defined and not when the function is first called with that data type.

We can provide data types as the first argument of the decorator. We can specify the input and out data types of function using **ret_type(param1_type, param2_type, ...)** format. The input parameters data type is specified inside of parenthesis and return type is specified outside of parenthesis at the beginning. The data type that we can use in **@jit** decorator needs to be imported from **Numba**. If input or output element is an array then we can represent it using strong **'[:]'** followed by data type.

Please make a **NOTE** that when declaring functions with the data type, **Numba** will only allow us to execute functions with specified data types. All calls of any other data type will fail.

Below we have redefined our functions which we have been using for the last few examples again but this time we have provided input/output data types as well. We have decorated our function with **int64** data type for both input and output. This will create a compiled version for this data type when we execute the below cell. Now, when we execute these functions with **int64** data types, it does not need compilation again, it'll just run them immediately.

As we have declared our functions with input/output data types as integer, if we call the below functions with float data types then it'll fail.

In [189]:

```
from numba import jit, int64, float32, float64
@jit(int64(int64), nopython=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit(int64[:](int64[:]), nopython=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have run our jit-decorated function with data types, first with an array of **1M** **int64** numbers and then with an array of **10M** **int64** numbers. We have also recorded the time taken by both. We can notice from the time that it has improved further compared to all our previous versions.

In [190]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [191]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.int64))
```

If our input function works with different data types then we can specify more than one signature as well inside of **@jit** decorator. The data type signatures can be specified as a list.

Below we have specified two different data types signatures for our functions. **Numba** will internally create compiled versions for both data types. Now our functions can run with these two data types, call with some other data type will fail.

In [214]:

```
from numba import jit, int64, float32, float64
@jit([int64(int64), float64(float64)], nopython=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have executed our functions with arrays of sizes **1M** and **10M** respectively. We have first executed them with **int64** data type and then with **float64**. We have also recorded the time taken by both. We can notice from the time taken that it has improved quite a lot compared to our examples where we had not declared data types.

In [215]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [216]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

In [217]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.int64))
```

In [218]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

When we call a **@jit** decorated function with the particular data type, **Numba** creates a machine code for it. This compilation can take time. We can avoid this compilation time if we are calling functions more than once by setting **cache** argument of **@jit** decorator to **True**.

**Numba** will internally use file-based cache to maintain compiled versions of functions.

Below we have re-defined our functions with **cache** argument set to **True**.

In [321]:

```
from numba import jit, int32, int64, float32, float64
@jit([int32(int32), int64(int64), float64(float64)], nopython=True, cache=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True, cache=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have executed our functions three times using the same array of **1M** integer numbers. We have also recorded the time taken for executions. We can notice from the time taken by executions that they are the lowest of all our tries till now.

In [322]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

Below we have executed our functions three times using the same array of **1M** float numbers. We have also recorded the time taken for executions. The time taken for executions is the least of all our tries till now.

In [323]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

Below we have executed our functions three times using the same array of **10M** float numbers. We have also recorded the time taken for executions. The time taken is the least of all our tries of the same functions till now.

In [325]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

**Numba** can also parallelize our code on multi-core CPUs. It uses multi-threading to speed up code by running threads on different cores of the computer in parallel. In order to parallelize code, we need to set **parallel** parameter of **@jit** decorator to **True**. There are two types of parallelization available in **Numba**

**Automatic Parallelization**- When we decorate our function with**@jit(parallel=True)**decorator,**Numba**will try to run function in parallel if possible else it'll run it normally.**Explicit Parallel Loops**- We can explicitly force**Numba**to run code in parallel by using**prange()**function available from**Numba**for our loops. This will force**Numba**to parallelize code.

In our example, we'll use explicit parallelization by using **prange()** function.

Please make a **NOTE** that Python **Global Interpreter Lock (GIL)** can prevent the speed up of multi-threading. We'll explain in our upcoming examples how we can release **GIL** and get around this problem.

Below we have re-defined our functions and set **parallel** parameter to **True** inside of **@jit** decorator. We have also modified the logic of our **perform_operation_jitted()** function to use **prange()** function. We are using index retrieved from **prange()** function to index array and retrieve individual element.

In [259]:

```
from numba import jit, int64, float32, float64, prange
@jit([int64(int64), float64(float64)], nopython=True, cache=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True, cache=True, parallel=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i in prange(len(x)):
res = cube_formula(x[i])
out[i] = res
return out
```

Now, we have run our parallelized function with arrays of size **1M** and **10M** to test their performance. We have also recorded the time taken by them. We have first used an array of **1M** integers, then an array of **1M** floats, and at last, an array of **10M** floats.

We can notice from that time taken by executions that performance has improved compared to non-parallelized versions.

In [262]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [263]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

In [264]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

**Numba** provides some additional performance in some situations by setting **fastmath** parameter to **True** inside of **@jit** decorator. The **fastmath** option, when set to **True**, will relax some numerical strict rules and perform approximate arithmetic & mathematical functions. If Intel's **short vector math library (SVML)** is installed on the system, then **Numba** can utilize it to improve performance when **fastmath** is set to **True**.

We can install intel's SVML library using the below conda command. Please see this link for more details on **SVML**.

**conda install -c numba icc_rt**

In this section, we have first **fastmath** normally and then along with **parallel** argument of **@jit** decorator.

In this section, we have first re-defined our functions and decorated them with **@jit** decorator. We have set **fastmath** parameter to **True** along with **nopython** and **cache** parameters. We have also provided data types for inputs/outputs of functions.

In [290]:

```
from numba import jit, int64, float32, float64, prange
@jit([int64(int64), float64(float64)], nopython=True, cache=True, fastmath=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True, cache=True, fastmath=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i, elem in enumerate(x):
res = cube_formula(elem)
out[i] = res
return out
```

Below we have tested our **@jit** decorated and **fastmath** set functions three times using different inputs.

First, we have executed it with **1M** integers three times, followed by **1M** floats three times and at last, **10M** floats three times. We can notice from the time recorded for executions that it seems to have improved performance a little bit.

In [291]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [292]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

In [293]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

In this section, we have re-defined the functions that we have been using for the last few examples. We have **@jit** decorated it along with options **nopython, cache, fastmath, and parallel** set to **True**.

In [294]:

```
from numba import jit, int64, float32, float64, prange
@jit([int64(int64), float64(float64)], nopython=True, cache=True, fastmath=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True, cache=True, fastmath=True, parallel=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i in prange(len(x)):
res = cube_formula(x[i])
out[i] = res
return out
```

Below we have tested our **fastmath** optimized and parallelized functions by executing with different arrays three times. We have also recorded the time taken by each for comparison. We can notice from the results that there is almost the same time as that of the parallel section above.

In [295]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [296]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

In [297]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

One of the python's drawbacks when using multi-threading is **GIL** which does not let python actually execute more than one thread in parallel in a few situations. To overcome this drawback, **Numba** let us skip python **GIL** by setting **nogil** parameter to **True** inside **@jit** decorator. When **Numba** can convert the majority of python code to low-level machine code, then it's not necessary to hold python's **GIL**.

Our functions for this example are exact copies of the functions we had defined in example 5 (with one minor change) when explaining how we can use multi-threading with **Numba @jit** decorator by setting **parallel=True**. We have set parameter **GIL** to **True** as well this time to let python release **GIL**.

In [298]:

```
from numba import jit, int64, float32, float64
@jit([int64(int64), float64(float64)], nopython=True, nogil=True)
def cube_formula(x):
return x**3 + 3*x**2 + 3
@jit([int64[:](int64[:]), float64[:](float64[:])], nopython=True, nogil=True, parallel=True)
def perform_operation_jitted(x):
out = np.empty_like(x)
for i in prange(len(x)):
res = cube_formula(x[i])
out[i] = res
return out
```

Below we have executed our jit-decorated function three times using different inputs. First we have executed function with an array of **1M** integers three times, then with an array of **1M** floats three times and at last with an array of **10M** floats three times. We have recorded the time taken by function each time. We can notice from the time recorded that the function seems to be doing better compared to the majority of our previous trials.

In [301]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.int64))
```

In [302]:

```
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e6, dtype=np.float64))
```

In [304]:

```
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
%time out = perform_operation_jitted(np.arange(1e7, dtype=np.float64))
```

As we have highlighted many times, **Numba** works well with python loops and **numpy**. Though **Pandas** is built on top of **numpy** but still **Numba** can not improve code involving pandas data structures using pandas operations. The reason behind this can be that **Numba** does not have access to lower-level code behind pandas API which it can optimize.

Below we have created a simple function that takes as input pandas dataframe and performs some operations on columns of pandas dataframe. It then returns a modified data frame. We have first run the function normally 3 times and recorded the time of each run.

We have then **@jit** decorated the same function and run it again three times. We have recorded the time taken by this jit-decorated function as well. We can clearly see from the results that **@jit** decorator does not seem to improve results. It even increases the time taken by the function.

The below examples show that using **Numba** involving only **pandas** code will not result in improving performance. It can even backfire and can take time to run the first time as seen below. Because it tried to convert code to **Numba** for improving performance but it failed and fall back to **pure python** at last.

Though decorating functions involving pandas data frame with **@jit** decorator does not seem to improve results, but there are ways to improve functions involving pandas dataframe. We have discussed how we can improve the code involving pandas dataframe using **Numba** and its decorators in a separate tutorial. Please feel free to check it.

In [14]:

```
def work_on_dataframe(df):
df['Col1'] = (df.Col1 * 100)
df['Col2'] = (df.Col1 * df.Col3)
df = df.where((df > 100) & (df < 10000))
df = df.dropna(how='any')
return df
data = {'Col1': range(10000), 'Col2': range(10000), 'Col3': range(10000)}
df = pd.DataFrame(data=data)
%time df = work_on_dataframe(df)
%time df = work_on_dataframe(df)
%time df = work_on_dataframe(df)
```

In [17]:

```
from numba import jit
@jit
def work_on_dataframe(df):
df['Col1'] = (df.Col1 * 100)
df['Col2'] = (df.Col1 * df.Col3)
df = df.where((df > 100) & (df < 10000))
df = df.dropna(how='any')
return df
data = {'Col1': range(1000), 'Col2': range(1000), 'Col3': range(1000)}
df = pd.DataFrame(data=data)
%time df = work_on_dataframe(df)
%time df = work_on_dataframe(df)
%time df = work_on_dataframe(df)
```

This ends our small tutorial explaining **Numba @jit** decorator to speed-up python code. Please feel free to let us know your views in the comments section.

- How to Speed up Code involving Pandas DataFrame using Numba?
- Numba @vectorize Decorator: Convert Scaler Function to Universal Function (ufunc)
- Numba @stencil Decorator: Guide to Improve Performance of Code involving Stencil Kernels
- Numba @guvectorize Decorator: Generalized Universal Functions
- Numba Performance Tips

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki

Numba @stencil Decorator: Guide to Improve Performance of Code involving Stencil Kernels

Numba @guvectorize Decorator: Generalized Universal Functions

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index

xarray (Dataset) : Multi-Dimensional Labelled Arrays