Updated On : Nov-06,2021 Tags numba, vectorize-decorator, ufunc
Numba @vectorize Decorator: Convert Scaler Function to Universal Function (ufunc)

Numba @vectorize Decorator: Convert Scaler Function to Universal Function (ufunc)

Numba is a python library that translates a subset of our python code into low-level machine code using LLVM compiler to speed up our existing python code. In order to speed up our code, it generally does not require many changes to our code, using one of the decorators (@jit, @vectorize, etc) provided by numba generally works very well. Numba works well on functions that involve python loops or numpy arrays. When we decorate our existing function with a numba decorator, it compiles the part of the function code which it can translate to lower-level machine code. The lower level machine-translated part of the function runs faster and speeds up the function. Many times, numba can translate whole function code as well to lower level machine instructions. We have already covered another tutorial where we have discussed numba @jit decorator. Please feel free to check it if you are interested in learning about @jit decorator.

In this tutorial, we'll be discussing another important decorator provided by numba named @vectorize. The concept behind the vectorize decorator is the same as that of the numpy vectorize() function. It translates any function which works on single scalar input to a function that can work on an array of scalars. The numpy commonly refers to such function as ufunc or universal function. In our tutorial, we'll be taking a simple function that works on scalar and converting it to universal functions using NumPy’s vectorize() method and numba's @vectorize decorators. We'll then run these modified functions to check their performance for comparison. We'll also decorate the function to a loop-based function and decorate with @jit decorator to check the performance. We'll also compare the performance of @vectorize decorator with different arguments.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  • Example 1 - Function with Single Scalar Input
    • Normal Function with Scalar
    • Numpy Vectorized Function (np.vectorize())
    • Numba JIT Wrapped (jit()) Function
      • First Execution
      • Second Execution
      • Execution with Different Data Type
    • Loop-based and Numba JIT Decorated ((@jit)) Function
    • Numba Vectorize Decorated (@vectorize) Function
    • Numba Vectorize Decorated (@vectorize) and Parallelized Function
    • Numba Vectorize Decorated (@vectorize) and Cached Function
  • Example 2 - Function with 2 Scalar Inputs

We'll start by importing necessary libraries (numba and numpy).

In [1]:
import numba

print("Numba Version : {}".format(numba.__version__))
Numba Version : 0.54.1
In [2]:
import numpy as np

Example 1

In this section, we'll create a simple function that works single scalar, vectorize it with numpy and numba. We'll then compare the performance of various methods.

1.1 Normal Function with Scalar

Below we created a simple function that takes as input a single scalar value and evaluates formula x^3 + 3x^2 + 3. We'll be vectorizing this function to make it work on arrays of numbers using different methods and measure the performance of various methods.

In [3]:
def cube_formula(x):
    return x**3 + 3*x**2 + 3

cube_formula(5)
Out[3]:
203

1.2 Numpy Vectorized Function

In this section, we have vectorized our cube_formula() function using np.vectorize() function. The np.vectorize() function takes as input any function and make it run on numpy array. The function wrapped inside of np.vectorize() will run faster compared to same function run as python loop through array.

In [4]:
vectorized_cube_formula = np.vectorize(cube_formula)

vectorized_cube_formula
Out[4]:
<numpy.vectorize at 0x7f05d21b2cf8>

After vectorizing the function, we have created an array of 1M numbers and called our vectorized function to execute on this array. We have also recorded the time taken by numpy vectorized function using %%time cell function of jupyter notebook. At last, we have printed a few values of the results.

If you are interested in learning about cell magic commands (like %%time) available in jupyter notebook then please feel free to check our tutorial on the same. It covers the majority of jupyter notebook magic commands.

In [5]:
arr = np.arange(1, 1000000, dtype=np.int64)
In [6]:
%%time

res = vectorized_cube_formula(arr)
CPU times: user 508 ms, sys: 54.6 ms, total: 563 ms
Wall time: 562 ms
In [7]:
res[:5]
Out[7]:
array([  7,  23,  57, 115, 203])

1.3 Numba JIT Wrapped Function

In this section, we have taken our cube_formula() function and wrapped it inside of jit() function available through numba. This is exactly the same as like we have decorated our original function with @jit decorator. The numba @jit decorator can be applied to any python function and it'll try to speed up the python function by using LLVM compiler. It'll try to speed up the whole function but if it's not able to convert the whole convert function then it'll at least try to speed up parts that it can convert to lower level machine code.

Our cube_formula() function is designed in a way that it can take as input numpy array as well and it'll work.

For detailed information about jit() function or @jit decorator, please feel free to check our tutorial on the same.

In [8]:
from numba import jit

cube_formula_jitted = jit(cube_formula)

First Execution

After jit-wrapping our function, below we have run it using an array of 1M entries which we had created earlier. We can notice that by just jit-wrapping the speed of function has improved. It has taken less time compared to the numpy vectorized version of the same function.

In [9]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 497 ms, sys: 92.3 ms, total: 589 ms
Wall time: 434 ms
In [10]:
res[:5]
Out[10]:
array([  7,  23,  57, 115, 203])

Second Execution

Below we have executed our same function a second time with the same input array and this time it takes quite less amount of time which is quite a good improvement in speed.

In [11]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 2.89 ms, sys: 532 µs, total: 3.42 ms
Wall time: 3.43 ms
In [12]:
res[:5]
Out[12]:
array([  7,  23,  57, 115, 203])

Execution with Different Data Type

In this section, we have changed the data type of the input array from integer data type to float and then executed the jit-wrapped function using this new array. We can notice that the time taken by the jit-wrapped function is quite less compared to the numpy vectorized function.

In [13]:
arr = arr.astype(np.float64)
In [14]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 155 ms, sys: 0 ns, total: 155 ms
Wall time: 154 ms
In [15]:
res[:5]
Out[15]:
array([  7.,  23.,  57., 115., 203.])

1.4 Loop-based and Numba JIT Decorated Function

In this section, we have first modified our cube_formula() function to make it work on an array. We have considered an input array as a sequence of numbers. We use a python loop to go through each element of the input sequence and calculate our cube formula on individual elements recording results in a different array. We have @jit wrapped this function to improve its performance.

In [16]:
from numba import jit

@jit(nopython=True)
def cube_formula_jitted(x):
    xs = []
    for i in x:
        xs.append(i**3 + 3*i**2 + 3)

    return xs

First Execution

In this section, we have first converted our original 1M numbers numpy array to an array of integers again. We have then executed our new cube formula function with this array and recorded the time taken by it.

We can notice from the results that the time taken for execution is less compared to both numpy vectorized and jit-wrapped functions. It seems that we have further improved our original cube formula function.

In [17]:
arr = arr.astype(np.int64)
In [18]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 189 ms, sys: 15.8 ms, total: 205 ms
Wall time: 204 ms
In [19]:
res[:5]
Out[19]:
[7, 23, 57, 115, 203]

Second Execution

In this section, we have again executed our jit-decorated function using the same integer array to check whether it takes less time compared to the first run and we can notice from the results that it takes significantly less time compared to the previous run.

In [20]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 28.9 ms, sys: 16.5 ms, total: 45.3 ms
Wall time: 45.2 ms
In [21]:
res[:5]
Out[21]:
[7, 23, 57, 115, 203]

Execution with Different Data Type

In this section, we have first converted our array of integers to an array of floats. We have then executed our jit-decorated function with this array of floats. We can notice from the time taken by it that it takes less time compared to numpy vectorized and jit-wrapped functions.

In [22]:
arr = arr.astype(np.float64)
In [23]:
%%time

res = cube_formula_jitted(arr)
CPU times: user 194 ms, sys: 11.1 ms, total: 205 ms
Wall time: 205 ms
In [24]:
res[:5]
Out[24]:
[7.0, 23.0, 57.0, 115.0, 203.0]

1.5 Numba Vectorize Decorated Function

In this section, we have decorated our cube formula function with @vectorize decorator. The @vectorize decorator requires us to specify possible data types of input and output of the function. It'll then create a compiled version for each data type. The data type should be in order from less memory data type to more memory data type. Below we have highlighted the signature of @vectorize decorator.

@vectorize([ret_datatype1(input1_datatype1,input2_datatype1,...), ret_datatype2(input1_datatype2,input2_datatype2,...), ...], target='cpu', cache=False)
def func(x):
    return x*x

Apart from datatypes, it accepts two other arguments.

  • target - This argument accepts one of the below-mentioned three strings as input specifying how to further speed up code based on available resources.
    • 'cpu' - This is default argument. It's used for a single-core (single-threaded) CPU.
    • 'parallel' - This argument runs code in parallel on multi-core (multi-threaded) CPU.
    • 'cuda' - This argument is set for GPU.
  • cache - This parameter accepts boolean values specifying whether to use caching to speed up reruns of the same function again and again with the same inputs.
In [25]:
from numba import vectorize, int64, float32, float64

@vectorize([int64(int64), float32(float32), float64(float64)])
def cube_formula_numba_vec(x):
    return x**3 + 3*x**2 + 3

First Execution

In this section, we have executed our @vectorize decorated function with our 1M elements array to check its performance. We can notice from the results that it easily outperforms all our previous trials (numpy vectorized, jit-wrapped, jit-decorated). The improvement in speed up is really big.

In [26]:
arr = arr.astype(np.int64)
In [27]:
%%time

res = cube_formula_numba_vec(arr)
CPU times: user 8.44 ms, sys: 3.53 ms, total: 12 ms
Wall time: 11.6 ms
In [28]:
res[:5]
Out[28]:
array([  7,  23,  57, 115, 203])

Second Execution

In this section, we have executed our vectorize decorate function a second time with the same array as input and we can notice that it takes even less time compared to the last execution.

In [29]:
%%time

res = cube_formula_numba_vec(arr)
CPU times: user 2.97 ms, sys: 0 ns, total: 2.97 ms
Wall time: 2.63 ms
In [30]:
res[:5]
Out[30]:
array([  7,  23,  57, 115, 203])

Execution with Different Data Type

In this section, we have executed our vectorize decorated function with our big array by converting it from integer to float array. We can notice from the recorded time that the numba vectorize decorated function takes quite less time compared to all our previous trials.

In [31]:
arr = arr.astype(np.float64)
In [32]:
%%time

res = cube_formula_numba_vec(arr)
CPU times: user 2.18 ms, sys: 321 µs, total: 2.5 ms
Wall time: 2.14 ms
In [33]:
res[:5]
Out[33]:
array([  7.,  23.,  57., 115., 203.])

1.6 Numba Vectorize Decorated and Parallelized Function

In this section, we have decorated our cube formula function again with @vectorize decorator. But this time, we have set target parameter of the decorator to 'parallel' to check whether using multi-threading improves our results further or not.

In [34]:
from numba import vectorize, int64, float32, float64

@vectorize([int64(int64), float32(float32), float64(float64)], target="parallel")
def cube_formula_numba_vec_paralleled(x):
    return x**3 + 3*x**2 + 3

First Execution

In this section, we have executed our vectorize-decorated and parallelized function with our big array of integers. We can notice from the results that it's almost the same as that of normal vectorize-decorated. The 'parallel' target value does not seem to have improved results much. We recommend that you try using 'parallel' keyword with your code to check whether it’s improving performance or not as we think that for much bigger arrays it might improve performance though it might not be visible in this example.

In [35]:
arr = arr.astype(np.int64)
In [36]:
%%time

res = cube_formula_numba_vec_paralleled(arr)
CPU times: user 53.3 ms, sys: 595 µs, total: 53.9 ms
Wall time: 19.5 ms
In [37]:
res[:5]
Out[37]:
array([  7,  23,  57, 115, 203])

Second Execution

In this section, we have executed our vectorize decorated and parallelized function again with the same array to check whether the second run improves performance or not. From the results, we can notice that the time taken is almost the same as the first run hence not much improvement.

In [38]:
%%time

res = cube_formula_numba_vec_paralleled(arr)
CPU times: user 22.3 ms, sys: 36.4 ms, total: 58.6 ms
Wall time: 22.2 ms
In [39]:
res[:5]
Out[39]:
array([  7,  23,  57, 115, 203])

Execution with Different Data Type

In this section, we have executed our vectorize-decorated and parallelized function with our array of floats. We have converted the input array first to a float array from an integer array. We can notice from the results that there is not much improvement over the normal vectorize-decorated function.

In [40]:
arr = arr.astype(np.float64)
In [41]:
%%time

res = cube_formula_numba_vec_paralleled(arr)
CPU times: user 39.5 ms, sys: 0 ns, total: 39.5 ms
Wall time: 14.2 ms
In [42]:
res[:5]
Out[42]:
array([  7.,  23.,  57., 115., 203.])

1.7 Numba Vectorize Decorated and Cached Function

In this section, we have again vectorize-decorated our cube formula function. We have also set cache argument of the function to True to check whether it helps in improving performance or not.

In [43]:
from numba import vectorize, int64, float32, float64

@vectorize([int64(int64), float32(float32), float64(float64)], cache=True)
def cube_formula_numba_vec_cached(x):
    return x**3 + 3*x**2 + 3

First Execution

Below we have executed our vectorize decorated and cached function with our array of 1M integers. We can notice from the results that the time taken is almost the same as that of the normal vectorize decorated function.

In [44]:
arr = arr.astype(np.int64)
In [45]:
%%time

res = cube_formula_numba_vec_cached(arr)
CPU times: user 2.35 ms, sys: 0 ns, total: 2.35 ms
Wall time: 2.03 ms
In [46]:
res[:5]
Out[46]:
array([  7,  23,  57, 115, 203])

Second Execution

Below we have executed our function a second time with the same input to check whether there is any improvement. It seems from the results that the performance is almost the same as that of vectorize-decorated with cache set to False.

In [47]:
%%time

res = cube_formula_numba_vec_cached(arr)
CPU times: user 3.05 ms, sys: 7.6 ms, total: 10.6 ms
Wall time: 10.5 ms
In [48]:
res[:5]
Out[48]:
array([  7,  23,  57, 115, 203])

Example 2

In this section, we have created another example of demonstrating the use of @vectorize decorator. We have created a new function that works on two inputs hence our vectorized functions will take two arrays as input. This example will help understand how we can vectorize functions with multiple inputs.

2.1 Normal Function for Scalars

In this section, we have modified the definition of our cube formula function to accept 2 parameters instead of one. We have added one more parameter whose value replaces the constant value which we were adding to our formula. The new formula x^3 + 3x^2 + y now requires two parameters x and y as input.

In [49]:
def cube_formula(x, y):
    return x**3 + 3*x**2 + y
In [50]:
cube_formula(5,3)
Out[50]:
203

2.2 Numpy Vectorized Function

In this section, we have vectorized our new cube formula function using np.vectorize() function.

After vectorizing, we have created two input arrays of size 1M each and executed our function with these two arrays as input for x and y. We have also recorded the time taken by the function to execute using %%time magic command. We'll be comparing this output with various numba decorated functions to check whether numba decorators (@jit and @vectorize) are speeding up this function or not.

In [51]:
vectorized_cube_formula = np.vectorize(cube_formula)

vectorized_cube_formula
Out[51]:
<numpy.vectorize at 0x7f05d1e82c50>
In [52]:
arr = np.arange(1,1000001, dtype=np.int64)
ys = np.random.randint(1, 10, size=1000000, dtype=np.int64)
In [53]:
%%time

res = vectorized_cube_formula(arr, ys)
CPU times: user 521 ms, sys: 43.2 ms, total: 564 ms
Wall time: 563 ms
In [54]:
res[:5]
Out[54]:
array([  8,  21,  57, 115, 202])

2.3 Numba JIT Wrapped Function

In this section, we have wrapped our cube formula function inside of jit() function provided by numba. Our function is designed in a way that it can work on an array as well.

In [55]:
from numba import jit

cube_formula_jitted = jit(cube_formula)

First Execution

In this section, we have executed our jit-wrapped function with two input arrays that we had created earlier. We can notice from the run time that it has taken quite less time to run compared to the numpy vectorized version. By just wrapping our function inside of jit(), we get such good speed improvement. We'll try other things in upcoming sections to check whether we can further improve results or not.

In [56]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 149 ms, sys: 0 ns, total: 149 ms
Wall time: 156 ms
In [57]:
res[:5]
Out[57]:
array([  8,  21,  57, 115, 202])

Second Execution

In this section, we have executed our jit-wrapped function again with the same arrays as input to check whether there is any improvement in speed if we run the function with the same parameters again and again. From the results recorded, we can notice that it takes quite less time to run compared to the last run.

In [58]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 3.02 ms, sys: 0 ns, total: 3.02 ms
Wall time: 3.02 ms
In [59]:
res[:5]
Out[59]:
array([  8,  21,  57, 115, 202])

Execution with Different Data Type

In this section, we have changed the data type of our input arrays data type from integer to float. We have then executed our jit-wrapped function with these float arrays to check the time taken. We can notice that the time taken for float arrays is even less than that taken by integer arrays.

In [60]:
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
In [61]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 119 ms, sys: 3.75 ms, total: 123 ms
Wall time: 122 ms
In [62]:
res[:5]
Out[62]:
array([  8.,  21.,  57., 115., 202.])

2.4 Loop-based and Numba JIT Decorated Function

In this section, we have modified our cube formula function to work on the input of arrays. We have modified it to consider both inputs as arrays. We then loop through each element of input arrays, evaluate our cube formula and record the result in a different array. At last, we return the array in which results were stored. We have decorated our function with @jit decorator to speed it up.

In [63]:
from numba import jit

@jit(nopython=True)
def cube_formula_jitted(x, y):
    xs = []
    for i,j in zip(x,y):
        xs.append(i**3 + 3*i**2 + j)
    return xs

First Execution

In this section, we have executed our function using two input arrays of integers which we had created earlier. We can notice from the recorded time that it takes less time compared to the numpy vectorized and jit-wrapped functions. We have further speed up our function with these changes.

In [64]:
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
In [65]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 126 ms, sys: 11.7 ms, total: 138 ms
Wall time: 137 ms
In [66]:
res[:5]
Out[66]:
[8, 21, 57, 115, 202]

Second Execution

In this section, we have executed our jit-decorated function again with the same parameters as input to check whether the second run with the same input takes less time or more to execute. From the results, we can notice that the second run with the same parameters takes quite less time compared to the first run.

In [67]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 24.3 ms, sys: 16.2 ms, total: 40.5 ms
Wall time: 46.2 ms
In [68]:
res[:5]
Out[68]:
[8, 21, 57, 115, 202]

Execution with Different Data Type

In this section, we have executed our jit-decorated function with inputs converted to float data type. We can notice from the results that the time taken is almost the same as that taken by the jit-wrapped function.

In [69]:
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
In [70]:
%%time

res = cube_formula_jitted(arr, ys)
CPU times: user 147 ms, sys: 11.7 ms, total: 158 ms
Wall time: 157 ms
In [71]:
res[:5]
Out[71]:
[8.0, 21.0, 57.0, 115.0, 202.0]

2.5 Numba Vectorize Decorated Function

In this section, we have decorated our cube formula function with @vectorize decorator to check whether we can further improve performance using this decorator. Please take a look at the signature of the data type provided inside of @vectorize decorator. They are two entries inside of parenthesis because we have two input arrays.

In [72]:
from numba import vectorize, int64, float32, float64

@vectorize([int64(int64,int64), float32(float32,float32), float64(float64,float64)])
def cube_formula_numba_vec(x, y):
    return x**3 + 3*x**2 + y

First Execution

In this section, we have executed our vectorize-decorated function with two arrays of an integer as input and recorded the run time of it. We can notice that the time taken is quite less compared to all our previous trials (numpy vectorize, jit-wrapped, and jit-decorated). This is a significant improvement in speed by just decorating our function with @vectorize decorator.

In [73]:
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
In [74]:
%%time

res = cube_formula_numba_vec(arr,ys)
CPU times: user 10.5 ms, sys: 3.86 ms, total: 14.4 ms
Wall time: 14 ms
In [75]:
res[:5]
Out[75]:
array([  8,  21,  57, 115, 202])

Second Execution

In this section, we have again executed our function with the same inputs to check whether the second run is faster compared to the first run. The results are even better compared to the first run.

In [76]:
%%time

res = cube_formula_numba_vec(arr,ys)
CPU times: user 3.68 ms, sys: 0 ns, total: 3.68 ms
Wall time: 3.32 ms
In [77]:
res[:5]
Out[77]:
array([  8,  21,  57, 115, 202])

Execution with Different Data Type

In this section, we have first modified the data type of our input arrays from integer to float. We have then executed our vectorize-decorated function with these float arrays. We can notice from the recorded time that it took quite less time compared to all our previous trials. The speedup is significant and noticeable.

In [78]:
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
In [79]:
%%time

res = cube_formula_numba_vec(arr,ys)
CPU times: user 0 ns, sys: 3.48 ms, total: 3.48 ms
Wall time: 3.08 ms
In [80]:
res[:5]
Out[80]:
array([  8.,  21.,  57., 115., 202.])

2.6 Numba Vectorize Decorated and Parallelized Function

In this section, we have decorated our cube formula function with @vectorize decorator but we also have set target parameter to 'parallel' to check whether using multi-threading improves the performance further or not.

In [81]:
from numba import vectorize, int64, float32, float64

@vectorize([int64(int64,int64), float32(float32,float32), float64(float64,float64)], target="parallel")
def cube_formula_numba_vec_paralleled(x, y):
    return x**3 + 3*x**2 + y

First Execution

In this section, we have recorded the time taken by the vectorize-decorated function to check whether there is any speed up with parallelizing. The results are almost the same as that of the non-parallelized version. Though the results have not improved in our example, we recommend that you try parallelized version once to check whether it’s improving results in your case or not. The multi-threading adds some overhead to processing but with large data, it can be ignored if it's running faster compared to single-threaded runs.

In [82]:
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
In [83]:
%%time

res = cube_formula_numba_vec_paralleled(arr,ys)
CPU times: user 43.1 ms, sys: 283 µs, total: 43.4 ms
Wall time: 15.8 ms
In [84]:
res[:5]
Out[84]:
array([  8,  21,  57, 115, 202])

Second Execution

In this section, we have executed our vectorized function again with the same inputs to check whether there is any improvement in speed up but the results are almost the same as the last run.

In [85]:
%%time

res = cube_formula_numba_vec_paralleled(arr,ys)
CPU times: user 39.3 ms, sys: 3.45 ms, total: 42.8 ms
Wall time: 14.8 ms
In [86]:
res[:5]
Out[86]:
array([  8,  21,  57, 115, 202])

Execution with Different Data Type

In this section, we have run our vectorized and parallelized cube formula function with inputs of float data type. We can notice from the results that the results are almost the same as previous runs without parallelizing.

In [87]:
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
In [88]:
%%time

res = cube_formula_numba_vec_paralleled(arr,ys)
CPU times: user 33.3 ms, sys: 0 ns, total: 33.3 ms
Wall time: 11.8 ms
In [89]:
res[:5]
Out[89]:
array([  8.,  21.,  57., 115., 202.])

This ends our small tutorial explaining how we can use numba @vectorize decorator to translate a function working on scalars to function working on arrays. We also discussed speed up provided by @vectorize decorator. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki