**Numba** is an open-source python library that provides just-in-time compilation for python code to speed it up. It can speed up our existing python functions by decorating our functions with various decorators provided by it. The most commonly used decorator is **@jit** decorator which works with the majority of python functions. Please feel free to check our tutorial on **@jit** decorator if you want to learn about it.

Apart from **@jit**, numba provides two other decorators for designing numpy-like universal functions.

**@vectorize**- This decorator can turn a function that works on scalars into a function that can take an array of scalars and apply a function on them. It also speeds up the function which works on an array of scalars by a big amount. The input and output array shapes are the same with this decorator. Please feel free to check our tutorial on it where we have covered the decorator in detail with examples.**@guvectorize**- This decorator lets us create a function that takes input arrays and returns an array that can be a different size than input arrays. It can speed up arbitrary operations on input arrays. The functions created with this decorator also do not return output arrays. The output arrays need to be given as an argument of function and it'll be filled in with results.

As a part of this tutorial, we'll explain with examples how we can use **@guvectorize** decorator available from **Numba** to speed up our functions. We'll explain various arguments of the decorator as well. We'll create **@guvectorize** decorated numba functions to perform some operations and compare them with functions without a decorator to measure performance improvements due to decorator. We'll be measuring the time of functions to measure performance. Below we have listed important sections of our tutorial.

- Example 1: Simple Cube Formula with Constant
- Example 2: Simple Cube Formula with Array
- Example 3: Dot Product
- Example 4: Convolution

Below we have imported **Numba** and printed the version of it that we'll be using in our tutorial.

In [1]:

```
import numba
print("Numba Version : {}".format(numba.__version__))
```

In [2]:

```
import numpy as np
```

In our first example, we'll explain how we can use **@guvectorize** for functions involving operations on numpy arrays.

In order to use **@guvectorize** decorator, we need to follow some conventions that are listed below.

- We need to provide the symbolic dimensions of input and output arrays specified as strings.
- E.g -
**'(n),(n)->(n)'**- This string represent that method has two input arrays of shape**'n'**and one output array of shape**'n'**. - E.g -
**'(m),(n)->(m,n)'**- This string represent that method has two input arrays. One of shape 'm' and one of shape**'n'**. It has an output array of shape**'(m,n)'**. - The output array shape can be constructed only from input array shapes. If the input array has symbols 'm' and 'n' used in shape then the output array should be of shape constructed from symbols 'm' and 'n' only. You can not introduce a new shape symbol let’s say 'k' in the output array shape.

- E.g -
- We can provide data type of input and output arrays using
**Numba**data types. This convention is**optional**. If we provide data types in decorator (eager mode) then**numba**will create compiled version for those data types which can speed up operations. If we don't provide data types (lazy mode) then**numba**will create compiled versions as data types are detected when a function is called. - We need to provide one extra argument to the input function which will be referred to as output array.
**Numba**will create that array and fill value in it as per logic inside of the function. - We can provide other arguments for speed up which we provide to
**Numba @jit**decorators. Please check our tutorial on**Numba @jit**decorator which covers these arguments with examples in detail.**nopython**- It accepts boolean values. If set to**True**, it forces strict no python mode hence total code of function will be converted to low-level machine code. If**Numba**can't covert code to low-level then compilation will fail. The default is**False**.**cache**- It accepts boolean values. If set to**True**, it'll cache compiled codes for faster executions. The default is**False**.**fastmath**- It accepts boolean values. If set to**True**, it'll perform mathematical operations using intel's library to perform mathematical operations faster. The default is**False**.**forceobj**- It accepts boolean values. If set to**True**, it'll force object mode which is the opposite of no python mode.

Below, we have created the first example demonstrating the usage of **@guvectorize** decorator. We have created a simple function that takes as input three arguments. The first argument is an array, the second argument is scalar and the third argument is an array. The third argument is an output array which will be returned by function when we call it with input array and scalar. The function simply loops through values of the first array, calculates cube formula (**x^3 + 3x^2+y**) for each element of an array using scalar and individual values. It stores the results of calculations in the output array provided as the third argument to the array.

We have decorated the function with **@guvectorize** decorator.

The first argument to the decorator is a list with tuple. The tuple has three values specifying the data type of input arguments and output value. The first argument **int64[:]** represents **1D** array of integers, second argument **int64** represents integer scalar and third argument **int64[:]** represents **1D** array of integers. We can provide more than one signature by specifying more than one tuple where the length of each tuple will be 3. Let’s say we can provide a signature like this **[(int64[:], int64, int64[:]), (float64[:], float64, float64[:])]** which will handle integers and float data types. We can also mix data types where input data type and output data type can be different. E.g - **[(int64[:], int64, float64[:])]**

Please make a **NOTE** that we need to import these data types from **Numba** types.

The second argument to **@guvectorize** decorator is the symbolic form representing the signature of function in the form of shapes. The string **'(n),()->(n)'** represents that our function will take one dimensional array of shape **'n'** and scalar value (**'()'**) as input and it'll return an array of shape **'n'**. The data types provided as the first argument to **@guvectorize** decorator match in number with this symbolic signature.

In [20]:

```
from numba import guvectorize, int64
@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def calculate_cube_formula(x, y, res):
for i in range(x.shape[0]):
res[i] = x[i]**3 + 3*x[i]**2 + y
```

In the below cell, we are testing the performance of our function. We have first created an array of **10_000** integers. We have then calculated the cube formula using simply python operators and recorded the time taken by it. We have used the value of **y** as 10 in our calculation.

Then, we have executed our **@guvectorize** decorated function 3 times with input array and scalar value 10. We have recorded the time taken by each call as well. We have stored the output in a separate variable. Then, in the next cell, we have printed the output of all calls to verify that the results are the same.

Please make a **NOTE** from the call to cube formula function that we have called function with only 2 arguments whereas we had declared a function with three arguments. As we had said earlier, the last argument is an output array that is created by **Numba** and returned after the function call completes.

We have recorded the time taken by various function calls using **%time** jupyter notebook magic command. It records the time of a single python statement that is executed after it. We can notice from the results that **@guvectorize** decorated function performs better than normal python operator-based execution.

If you are interested in learning about various magic commands available from the jupyter notebook then please feel free to check our tutorial on the same which covers the majority of commands with simple examples.

In [64]:

```
arr = np.arange(10_000)
%time out1 = arr**3 + 3*arr**2 + 10
%time out2 = calculate_cube_formula(arr,10)
%time out3 = calculate_cube_formula(arr,10)
%time out4 = calculate_cube_formula(arr,10)
```

In [65]:

```
out1[:5], out2[:5], out3[:5], out4[:5]
```

Out[65]:

Please make a **NOTE** that performance improvement is given by various **Numba** decorated functions might not be visible with small arrays but the difference becomes visible as we start working with large arrays.

In this example, we have recreated our cube formula function from the previous example with a minor change. We have modified the function code so that the second argument is also a numpy array of the same shape as the first array. We are using individual values of the second array along with individual values of the first array to calculate the cube formula.

We have also modified our data type signature. We have informed **@guvectorize** decorator that the function should work for integers as well as floats. **Numba** will create two compiled versions of the function based on two data type signatures.

We have also modified the symbolic signature represented as a string of the function (**'(n),(n)->()n'**). This symbolic signature represents that our function takes two input arrays of shape **'n'** and returns an output array of shape **'n'**.

This time, we have also set argument **nopython** to **True** to force **Numba** to compile the total code of the function to low-level machine code.

In [34]:

```
from numba import guvectorize, int64, float64
@guvectorize([(int64[:], int64[:], int64[:]), (float64[:], float64[:], float64[:])], '(n),(n)->(n)', nopython=True)
def calculate_cube_formula(x, y, res):
for i in range(x.shape[0]):
res[i] = x[i]**3 + 3*x[i]**2 + y[i]
```

Below, we have first created two arrays of integers with **10_000** elements each.

We have then calculated our cube formula by using python operators and recorded the time taken by it.

Then, we have executed our function three times with two input arrays and recorded the time taken by each call to calculate the cube formula.

We can notice from the results that our **@geuvectorize** decorated function does a better job compared to normal python operator-based execution to calculate cube formula.

In [38]:

```
arr = np.arange(10000)
y = np.arange(10000)
%time out1 = arr**3 + 3*arr**2 + y
%time out2 = calculate_cube_formula(arr,y)
%time out3 = calculate_cube_formula(arr,y)
%time out3 = calculate_cube_formula(arr,y)
```

In [68]:

```
out1[:5], out2[:5], out3[:5]
```

Out[68]:

In this example, we have created a function that takes as input two one-dimensional arrays and performs their dot product. We are calculating dot product by looping through elements of one array inside of elements of another array.

This time there is a change in both data type signature and symbolic shape-based signature. The data type signature **(int64[:], int64[:], int64[:,:])** points that our function takes as input two **1D** arrays of integers and returns one **2D** array of integers. Please take a look at how we have declared **2D** array (**int64[:,:]**).

The symbolic signature **'(n),(m)->(n,m)'** indicates that our method takes two one dimensional arrays of shape **'n'** and **'m'** as input and returns two dimensional array of shape **'(n,m)'**.

We have also set **nopython** argument to **True** to force strict no python mode for compilation.

In [29]:

```
from numba import guvectorize, int64
@guvectorize([(int64[:], int64[:], int64[:,:])], '(n),(m)->(n,m)', nopython=True)
def dot_product(x, y, res):
for i in range(x.shape[0]):
for j in range(y.shape[0]):
res[i,j] = x[i] * y[j]
```

Below we have first created two arrays that will be used to calculate dot product. Then, we have calculated the dot product using **numpy dot()** function and recorded the time taken by it. We have then called our dot product function three times and recorded the time taken by it as well. We can notice from the results that our function seems to be doing better than even the numpy optimized dot product function. This performance difference can increase further with big arrays.

In [33]:

```
arr = np.arange(1000).reshape(-1,1)
y = np.arange(500).reshape(1,-1)
arr1 = np.arange(1000)
y1 = np.arange(500)
%time out1 = np.dot(arr, y)
%time out2 = dot_product(arr1, y1)
%time out3 = dot_product(arr1, y1)
%time out4 = dot_product(arr1, y1)
```

In [25]:

```
out1[:5,:5]
```

Out[25]:

In [26]:

```
out2[:5,:5]
```

Out[26]:

In this example, we have created a python function that performs convolution operation on an input array using a given input kernel. We have then **@guvectorize** decorated that function to create **Numba** version of it. Then, we have called normal python function and **numba** decorated function to perform convolution operation on input arrays. We have recorded the time taken by both. We have also compared the performance of **numba** decorated function with **convolve()** function available from **scipy**.

Below we have first created an array of shape **(10,10)** and convolved kernel of shape **(3,3)** on it using **convolve()** function available from **scipy**. We have also recorded the execution time of the function for future reference.

In [41]:

```
from scipy import ndimage
arr = np.arange(100).reshape(10,10)
kernel = np.array([[1,1,1],
[1,0,1],
[1,1,1]
])
%time out = ndimage.convolve(arr, kernel)
out
```

Out[41]:

In the below cell, we have declared a python function that executes the input kernel on a given **2D** input array. Then, in the next cell, we have executed our previously declared kernel on the previously created **2D** array. We have also recorded the time taken by this execution.

In [47]:

```
def sum_all_neighbors(arr, kernel, out):
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
elem1 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j-1] * kernel[0,0]
elem2 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j] * kernel[0,1]
elem3 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j+1] * kernel[0,2]
elem4 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j-1] * kernel[1,0]
elem5 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j] * kernel[1,1]
elem6 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j+1] * kernel[1,2]
elem7 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j-1] * kernel[2,0]
elem8 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j] * kernel[2,1]
elem9 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j+1] * kernel[2,2]
out[i,j] = elem1+elem2+elem3+elem4+elem5+elem6+elem7+elem8+elem9
```

In [48]:

```
out = np.empty_like(arr)
%time sum_all_neighbors(arr, kernel, out)
out
```

Out[48]:

In the below cell, we have redefined our python function to perform convolution operation but this time we have decorated it with **@guvectorize** decorator. We have also provided data type signature and shape symbolic signature to the decorator. We have given data type signatures to work on integers. The symbolic signature **(m,m),(n,n)->(m,m)'** suggests that function takes arrays of shape **(m,m)** and **(n,n)** as input and return an array of shape **(m,m)**.

Then in the next cell, we have executed our **@guvectorize** decorated function three times and recorded the time of each execution.

We can notice from the results that our **@guvectorize** decorated function runs faster compared to **scipy.ndimage.convolve()** function and python function.

In [49]:

```
from numba import guvectorize, int64
@guvectorize([(int64[:,:], int64[:,:], int64[:,:])], '(m,m),(n,n)->(m,m)', nopython=True)
def sum_all_neighbors_numba(arr, kernel, out):
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
elem1 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j-1] * kernel[0,0]
elem2 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j] * kernel[0,1]
elem3 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j+1] * kernel[0,2]
elem4 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j-1] * kernel[1,0]
elem5 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j] * kernel[1,1]
elem6 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j+1] * kernel[1,2]
elem7 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j-1] * kernel[2,0]
elem8 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j] * kernel[2,1]
elem9 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j+1] * kernel[2,2]
out[i,j] = elem1+elem2+elem3+elem4+elem5+elem6+elem7+elem8+elem9
```

In [51]:

```
out = np.empty_like(arr)
%time sum_all_neighbors_numba(arr, kernel)
%time sum_all_neighbors_numba(arr, kernel)
%time sum_all_neighbors_numba(arr, kernel)
%time sum_all_neighbors_numba(arr, kernel)
```

Out[51]:

In the below cell, we have created a new array of shapes **(100,100)**. We have then executed our previously declared kernel on this array using **scipy.ndimage.convolve()**, python function and **@guvectorize** decorated function. We have recorded the time taken by each execution. We can notice that our **@guvectorize** decorated function takes quite less time compared to other options.

In [54]:

```
arr = np.arange(100_00).reshape(100,100)
out = np.empty_like(arr)
%time out = ndimage.convolve(arr, kernel)
%time sum_all_neighbors(arr, kernel, out)
%time t = sum_all_neighbors_numba(arr,kernel)
%time t = sum_all_neighbors_numba(arr,kernel)
%time t = sum_all_neighbors_numba(arr,kernel)
```

In [56]:

```
out
```

Out[56]:

In [55]:

```
t
```

Out[55]:

This ends our small tutorial explaining how we can use **@guvectorize** decorator from **Numba**. Please feel free to let us know your views in the comments section.

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki

Numba @stencil Decorator: Guide to Improve Performance of Code involving Stencil Kernels

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index

xarray (Dataset) : Multi-Dimensional Labelled Arrays

Pandas query(): Query Pandas DataFrame using Python Expressions