Updated On : Dec-18,2021 Tags numba, guvectorize-decor…

Numba @guvectorize Decorator: Generalized Universal Functions

Numba is an open-source python library that provides just-in-time compilation for python code to speed it up. It can speed up our existing python functions by decorating our functions with various decorators provided by it. The most commonly used decorator is @jit decorator which works with the majority of python functions. Please feel free to check our tutorial on @jit decorator if you want to learn about it.

Apart from @jit, numba provides two other decorators for designing numpy-like universal functions.

  1. @vectorize - This decorator can turn a function that works on scalars into a function that can take an array of scalars and apply a function on them. It also speeds up the function which works on an array of scalars by a big amount. The input and output array shapes are the same with this decorator. Please feel free to check our tutorial on it where we have covered the decorator in detail with examples.
  2. @guvectorize - This decorator lets us create a function that takes input arrays and returns an array that can be a different size than input arrays. It can speed up arbitrary operations on input arrays. The functions created with this decorator also do not return output arrays. The output arrays need to be given as an argument of function and it'll be filled in with results.

As a part of this tutorial, we'll explain with examples how we can use @guvectorize decorator available from Numba to speed up our functions. We'll explain various arguments of the decorator as well. We'll create @guvectorize decorated numba functions to perform some operations and compare them with functions without a decorator to measure performance improvements due to decorator. We'll be measuring the time of functions to measure performance. Below we have listed important sections of our tutorial.

Important Sections of Tutorial

Below we have imported Numba and printed the version of it that we'll be using in our tutorial.

In [1]:
import numba

print("Numba Version :  {}".format(numba.__version__))
Numba Version :  0.54.1
In [2]:
import numpy as np

Example 1: Simple Cube Formula with Constant

In our first example, we'll explain how we can use @guvectorize for functions involving operations on numpy arrays.

In order to use @guvectorize decorator, we need to follow some conventions that are listed below.

  • We need to provide the symbolic dimensions of input and output arrays specified as strings.
    • E.g - '(n),(n)->(n)' - This string represent that method has two input arrays of shape 'n' and one output array of shape 'n'.
    • E.g - '(m),(n)->(m,n)' - This string represent that method has two input arrays. One of shape 'm' and one of shape 'n'. It has an output array of shape '(m,n)'.
    • The output array shape can be constructed only from input array shapes. If the input array has symbols 'm' and 'n' used in shape then the output array should be of shape constructed from symbols 'm' and 'n' only. You can not introduce a new shape symbol let’s say 'k' in the output array shape.
  • We can provide data type of input and output arrays using Numba data types. This convention is optional. If we provide data types in decorator (eager mode) then numba will create compiled version for those data types which can speed up operations. If we don't provide data types (lazy mode) then numba will create compiled versions as data types are detected when a function is called.
  • We need to provide one extra argument to the input function which will be referred to as output array. Numba will create that array and fill value in it as per logic inside of the function.
  • We can provide other arguments for speed up which we provide to Numba @jit decorators. Please check our tutorial on Numba @jit decorator which covers these arguments with examples in detail.
    • nopython - It accepts boolean values. If set to True, it forces strict no python mode hence total code of function will be converted to low-level machine code. If Numba can't covert code to low-level then compilation will fail. The default is False.
    • cache - It accepts boolean values. If set to True, it'll cache compiled codes for faster executions. The default is False.
    • fastmath - It accepts boolean values. If set to True, it'll perform mathematical operations using intel's library to perform mathematical operations faster. The default is False.
    • forceobj - It accepts boolean values. If set to True, it'll force object mode which is the opposite of no python mode.

Below, we have created the first example demonstrating the usage of @guvectorize decorator. We have created a simple function that takes as input three arguments. The first argument is an array, the second argument is scalar and the third argument is an array. The third argument is an output array which will be returned by function when we call it with input array and scalar. The function simply loops through values of the first array, calculates cube formula (x^3 + 3x^2+y) for each element of an array using scalar and individual values. It stores the results of calculations in the output array provided as the third argument to the array.

We have decorated the function with @guvectorize decorator.

The first argument to the decorator is a list with tuple. The tuple has three values specifying the data type of input arguments and output value. The first argument int64[:] represents 1D array of integers, second argument int64 represents integer scalar and third argument int64[:] represents 1D array of integers. We can provide more than one signature by specifying more than one tuple where the length of each tuple will be 3. Let’s say we can provide a signature like this [(int64[:], int64, int64[:]), (float64[:], float64, float64[:])] which will handle integers and float data types. We can also mix data types where input data type and output data type can be different. E.g - [(int64[:], int64, float64[:])]

Please make a NOTE that we need to import these data types from Numba types.

The second argument to @guvectorize decorator is the symbolic form representing the signature of function in the form of shapes. The string '(n),()->(n)' represents that our function will take one dimensional array of shape 'n' and scalar value ('()') as input and it'll return an array of shape 'n'. The data types provided as the first argument to @guvectorize decorator match in number with this symbolic signature.

In [20]:
from numba import guvectorize, int64

@guvectorize([(int64[:], int64, int64[:])], '(n),()->(n)')
def calculate_cube_formula(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i]**3 + 3*x[i]**2 + y

In the below cell, we are testing the performance of our function. We have first created an array of 10_000 integers. We have then calculated the cube formula using simply python operators and recorded the time taken by it. We have used the value of y as 10 in our calculation.

Then, we have executed our @guvectorize decorated function 3 times with input array and scalar value 10. We have recorded the time taken by each call as well. We have stored the output in a separate variable. Then, in the next cell, we have printed the output of all calls to verify that the results are the same.

Please make a NOTE from the call to cube formula function that we have called function with only 2 arguments whereas we had declared a function with three arguments. As we had said earlier, the last argument is an output array that is created by Numba and returned after the function call completes.

We have recorded the time taken by various function calls using %time jupyter notebook magic command. It records the time of a single python statement that is executed after it. We can notice from the results that @guvectorize decorated function performs better than normal python operator-based execution.

If you are interested in learning about various magic commands available from the jupyter notebook then please feel free to check our tutorial on the same which covers the majority of commands with simple examples.

In [64]:
arr = np.arange(10_000)

%time out1 = arr**3 + 3*arr**2 + 10

%time out2 = calculate_cube_formula(arr,10)

%time out3 = calculate_cube_formula(arr,10)

%time out4 = calculate_cube_formula(arr,10)
CPU times: user 1.67 ms, sys: 424 µs, total: 2.09 ms
Wall time: 1.32 ms
CPU times: user 79 µs, sys: 20 µs, total: 99 µs
Wall time: 108 µs
CPU times: user 67 µs, sys: 17 µs, total: 84 µs
Wall time: 92.3 µs
CPU times: user 616 µs, sys: 157 µs, total: 773 µs
Wall time: 609 µs
In [65]:
out1[:5], out2[:5], out3[:5], out4[:5]
Out[65]:
(array([ 10,  14,  30,  64, 122]),
 array([ 10,  14,  30,  64, 122]),
 array([ 10,  14,  30,  64, 122]),
 array([ 10,  14,  30,  64, 122]))

Please make a NOTE that performance improvement is given by various Numba decorated functions might not be visible with small arrays but the difference becomes visible as we start working with large arrays.

Example 2: Simple Cube Formula with Array

In this example, we have recreated our cube formula function from the previous example with a minor change. We have modified the function code so that the second argument is also a numpy array of the same shape as the first array. We are using individual values of the second array along with individual values of the first array to calculate the cube formula.

We have also modified our data type signature. We have informed @guvectorize decorator that the function should work for integers as well as floats. Numba will create two compiled versions of the function based on two data type signatures.

We have also modified the symbolic signature represented as a string of the function ('(n),(n)->()n'). This symbolic signature represents that our function takes two input arrays of shape 'n' and returns an output array of shape 'n'.

This time, we have also set argument nopython to True to force Numba to compile the total code of the function to low-level machine code.

In [34]:
from numba import guvectorize, int64, float64

@guvectorize([(int64[:], int64[:], int64[:]), (float64[:], float64[:], float64[:])], '(n),(n)->(n)', nopython=True)
def calculate_cube_formula(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i]**3 + 3*x[i]**2 + y[i]

Below, we have first created two arrays of integers with 10_000 elements each.

We have then calculated our cube formula by using python operators and recorded the time taken by it.

Then, we have executed our function three times with two input arrays and recorded the time taken by each call to calculate the cube formula.

We can notice from the results that our @geuvectorize decorated function does a better job compared to normal python operator-based execution to calculate cube formula.

In [38]:
arr = np.arange(10000)
y = np.arange(10000)

%time out1 = arr**3 + 3*arr**2 + y

%time out2 = calculate_cube_formula(arr,y)

%time out3 = calculate_cube_formula(arr,y)

%time out3 = calculate_cube_formula(arr,y)
CPU times: user 94 µs, sys: 14 µs, total: 108 µs
Wall time: 112 µs
CPU times: user 43 µs, sys: 6 µs, total: 49 µs
Wall time: 51 µs
CPU times: user 21 µs, sys: 3 µs, total: 24 µs
Wall time: 26.7 µs
CPU times: user 18 µs, sys: 2 µs, total: 20 µs
Wall time: 21.7 µs
In [68]:
out1[:5], out2[:5], out3[:5]
Out[68]:
(array([  0,   5,  22,  57, 116]),
 array([  0,   5,  22,  57, 116]),
 array([  0,   5,  22,  57, 116]))

Example 3: Dot Product

In this example, we have created a function that takes as input two one-dimensional arrays and performs their dot product. We are calculating dot product by looping through elements of one array inside of elements of another array.

This time there is a change in both data type signature and symbolic shape-based signature. The data type signature (int64[:], int64[:], int64[:,:]) points that our function takes as input two 1D arrays of integers and returns one 2D array of integers. Please take a look at how we have declared 2D array (int64[:,:]).

The symbolic signature '(n),(m)->(n,m)' indicates that our method takes two one dimensional arrays of shape 'n' and 'm' as input and returns two dimensional array of shape '(n,m)'.

We have also set nopython argument to True to force strict no python mode for compilation.

In [29]:
from numba import guvectorize, int64

@guvectorize([(int64[:], int64[:], int64[:,:])], '(n),(m)->(n,m)', nopython=True)
def dot_product(x, y, res):
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            res[i,j] = x[i] * y[j]

Below we have first created two arrays that will be used to calculate dot product. Then, we have calculated the dot product using numpy dot() function and recorded the time taken by it. We have then called our dot product function three times and recorded the time taken by it as well. We can notice from the results that our function seems to be doing better than even the numpy optimized dot product function. This performance difference can increase further with big arrays.

In [33]:
arr = np.arange(1000).reshape(-1,1)
y = np.arange(500).reshape(1,-1)

arr1 = np.arange(1000)
y1 = np.arange(500)

%time out1 = np.dot(arr, y)

%time out2 = dot_product(arr1, y1)

%time out3 = dot_product(arr1, y1)

%time out4 = dot_product(arr1, y1)
CPU times: user 6.32 ms, sys: 0 ns, total: 6.32 ms
Wall time: 6.36 ms
CPU times: user 1.83 ms, sys: 0 ns, total: 1.83 ms
Wall time: 1.84 ms
CPU times: user 1.67 ms, sys: 0 ns, total: 1.67 ms
Wall time: 1.69 ms
CPU times: user 1.66 ms, sys: 0 ns, total: 1.66 ms
Wall time: 1.69 ms
In [25]:
out1[:5,:5]
Out[25]:
array([[ 0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4],
       [ 0,  2,  4,  6,  8],
       [ 0,  3,  6,  9, 12],
       [ 0,  4,  8, 12, 16]])
In [26]:
out2[:5,:5]
Out[26]:
array([[ 0,  0,  0,  0,  0],
       [ 0,  1,  2,  3,  4],
       [ 0,  2,  4,  6,  8],
       [ 0,  3,  6,  9, 12],
       [ 0,  4,  8, 12, 16]])

Example 4: Convolution

In this example, we have created a python function that performs convolution operation on an input array using a given input kernel. We have then @guvectorize decorated that function to create Numba version of it. Then, we have called normal python function and numba decorated function to perform convolution operation on input arrays. We have recorded the time taken by both. We have also compared the performance of numba decorated function with convolve() function available from scipy.

Below we have first created an array of shape (10,10) and convolved kernel of shape (3,3) on it using convolve() function available from scipy. We have also recorded the execution time of the function for future reference.

In [41]:
from scipy import ndimage

arr = np.arange(100).reshape(10,10)

kernel = np.array([[1,1,1],
                   [1,0,1],
                   [1,1,1]
                  ])

%time out = ndimage.convolve(arr, kernel)

out
CPU times: user 543 µs, sys: 0 ns, total: 543 µs
Wall time: 557 µs
Out[41]:
array([[ 33,  38,  46,  54,  62,  70,  78,  86,  94,  99],
       [ 83,  88,  96, 104, 112, 120, 128, 136, 144, 149],
       [163, 168, 176, 184, 192, 200, 208, 216, 224, 229],
       [243, 248, 256, 264, 272, 280, 288, 296, 304, 309],
       [323, 328, 336, 344, 352, 360, 368, 376, 384, 389],
       [403, 408, 416, 424, 432, 440, 448, 456, 464, 469],
       [483, 488, 496, 504, 512, 520, 528, 536, 544, 549],
       [563, 568, 576, 584, 592, 600, 608, 616, 624, 629],
       [643, 648, 656, 664, 672, 680, 688, 696, 704, 709],
       [693, 698, 706, 714, 722, 730, 738, 746, 754, 759]])

In the below cell, we have declared a python function that executes the input kernel on a given 2D input array. Then, in the next cell, we have executed our previously declared kernel on the previously created 2D array. We have also recorded the time taken by this execution.

In [47]:
def sum_all_neighbors(arr, kernel, out):
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            elem1 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j-1] * kernel[0,0]
            elem2 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j] * kernel[0,1]
            elem3 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j+1] * kernel[0,2]
            elem4 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j-1] * kernel[1,0]
            elem5 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j] * kernel[1,1]
            elem6 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j+1] * kernel[1,2]
            elem7 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j-1] * kernel[2,0]
            elem8 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j] * kernel[2,1]
            elem9 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j+1] * kernel[2,2]


            out[i,j] = elem1+elem2+elem3+elem4+elem5+elem6+elem7+elem8+elem9
In [48]:
out = np.empty_like(arr)

%time sum_all_neighbors(arr, kernel, out)

out
CPU times: user 287 µs, sys: 45 µs, total: 332 µs
Wall time: 335 µs
Out[48]:
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  88,  96, 104, 112, 120, 128, 136, 144,   0],
       [  0, 168, 176, 184, 192, 200, 208, 216, 224,   0],
       [  0, 248, 256, 264, 272, 280, 288, 296, 304,   0],
       [  0, 328, 336, 344, 352, 360, 368, 376, 384,   0],
       [  0, 408, 416, 424, 432, 440, 448, 456, 464,   0],
       [  0, 488, 496, 504, 512, 520, 528, 536, 544,   0],
       [  0, 568, 576, 584, 592, 600, 608, 616, 624,   0],
       [  0, 648, 656, 664, 672, 680, 688, 696, 704,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])

In the below cell, we have redefined our python function to perform convolution operation but this time we have decorated it with @guvectorize decorator. We have also provided data type signature and shape symbolic signature to the decorator. We have given data type signatures to work on integers. The symbolic signature (m,m),(n,n)->(m,m)' suggests that function takes arrays of shape (m,m) and (n,n) as input and return an array of shape (m,m).

Then in the next cell, we have executed our @guvectorize decorated function three times and recorded the time of each execution.

We can notice from the results that our @guvectorize decorated function runs faster compared to scipy.ndimage.convolve() function and python function.

In [49]:
from numba import guvectorize, int64

@guvectorize([(int64[:,:], int64[:,:], int64[:,:])], '(m,m),(n,n)->(m,m)', nopython=True)
def sum_all_neighbors_numba(arr, kernel, out):
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            elem1 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j-1] * kernel[0,0]
            elem2 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j] * kernel[0,1]
            elem3 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i-1,j+1] * kernel[0,2]
            elem4 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j-1] * kernel[1,0]
            elem5 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j] * kernel[1,1]
            elem6 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i,j+1] * kernel[1,2]
            elem7 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j-1] * kernel[2,0]
            elem8 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j] * kernel[2,1]
            elem9 = 0 if (i-1<0) or (j-1<0) or (i+1>=arr.shape[0]) or (j+1>=arr.shape[1]) else arr[i+1,j+1] * kernel[2,2]

            out[i,j] = elem1+elem2+elem3+elem4+elem5+elem6+elem7+elem8+elem9
In [51]:
out = np.empty_like(arr)

%time sum_all_neighbors_numba(arr, kernel)

%time sum_all_neighbors_numba(arr, kernel)

%time sum_all_neighbors_numba(arr, kernel)

%time sum_all_neighbors_numba(arr, kernel)
CPU times: user 19 µs, sys: 3 µs, total: 22 µs
Wall time: 24.1 µs
CPU times: user 11 µs, sys: 2 µs, total: 13 µs
Wall time: 15 µs
CPU times: user 9 µs, sys: 2 µs, total: 11 µs
Wall time: 11.7 µs
CPU times: user 26 µs, sys: 4 µs, total: 30 µs
Wall time: 32.2 µs
Out[51]:
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,  88,  96, 104, 112, 120, 128, 136, 144,   0],
       [  0, 168, 176, 184, 192, 200, 208, 216, 224,   0],
       [  0, 248, 256, 264, 272, 280, 288, 296, 304,   0],
       [  0, 328, 336, 344, 352, 360, 368, 376, 384,   0],
       [  0, 408, 416, 424, 432, 440, 448, 456, 464,   0],
       [  0, 488, 496, 504, 512, 520, 528, 536, 544,   0],
       [  0, 568, 576, 584, 592, 600, 608, 616, 624,   0],
       [  0, 648, 656, 664, 672, 680, 688, 696, 704,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])

In the below cell, we have created a new array of shapes (100,100). We have then executed our previously declared kernel on this array using scipy.ndimage.convolve(), python function and @guvectorize decorated function. We have recorded the time taken by each execution. We can notice that our @guvectorize decorated function takes quite less time compared to other options.

In [54]:
arr = np.arange(100_00).reshape(100,100)
out = np.empty_like(arr)

%time out = ndimage.convolve(arr, kernel)

%time sum_all_neighbors(arr, kernel, out)

%time t = sum_all_neighbors_numba(arr,kernel)

%time t = sum_all_neighbors_numba(arr,kernel)

%time t = sum_all_neighbors_numba(arr,kernel)
CPU times: user 1.67 ms, sys: 11 µs, total: 1.68 ms
Wall time: 988 µs
CPU times: user 52.4 ms, sys: 0 ns, total: 52.4 ms
Wall time: 53.9 ms
CPU times: user 199 µs, sys: 0 ns, total: 199 µs
Wall time: 114 µs
CPU times: user 94 µs, sys: 0 ns, total: 94 µs
Wall time: 95.8 µs
CPU times: user 204 µs, sys: 0 ns, total: 204 µs
Wall time: 116 µs
In [56]:
out
Out[56]:
array([[    0,     0,     0, ...,     0,     0,     0],
       [    0,   808,   816, ...,  1576,  1584,     0],
       [    0,  1608,  1616, ...,  2376,  2384,     0],
       ...,
       [    0, 77608, 77616, ..., 78376, 78384,     0],
       [    0, 78408, 78416, ..., 79176, 79184,     0],
       [    0,     0,     0, ...,     0,     0,     0]])
In [55]:
t
Out[55]:
array([[    0,     0,     0, ...,     0,     0,     0],
       [    0,   808,   816, ...,  1576,  1584,     0],
       [    0,  1608,  1616, ...,  2376,  2384,     0],
       ...,
       [    0, 77608, 77616, ..., 78376, 78384,     0],
       [    0, 78408, 78416, ..., 79176, 79184,     0],
       [    0,     0,     0, ...,     0,     0,     0]])
Sunny Solanki  Sunny Solanki

 Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking HERE.