Numba is a python library that translates a subset of our python code into low-level machine code using **LLVM** compiler to speed up our existing python code. In order to speed up our code, it generally does not require many changes to our code, using one of the decorators (**@jit, @vectorize, etc**) provided by numba generally works very well. Numba works well on functions that involve python loops or numpy arrays. When we decorate our existing function with a numba decorator, it compiles the part of the function code which it can translate to lower-level machine code. The lower level machine-translated part of the function runs faster and speeds up the function. Many times, numba can translate whole function code as well to lower level machine instructions. We have already covered another tutorial where we have discussed numba **@jit** decorator. Please feel free to check it if you are interested in learning about **@jit** decorator.

In this tutorial, we'll be discussing another important decorator provided by numba named **@vectorize**. The concept behind the vectorize decorator is the same as that of the numpy **vectorize()** function. It translates any function which works on single scalar input to a function that can work on an array of scalars. The numpy commonly refers to such function as **ufunc** or universal function. In our tutorial, we'll be taking a simple function that works on scalar and converting it to universal functions using NumPy’s **vectorize()** method and numba's **@vectorize** decorators. We'll then run these modified functions to check their performance for comparison. We'll also decorate the function to a loop-based function and decorate with **@jit** decorator to check the performance. We'll also compare the performance of **@vectorize** decorator with different arguments.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

- Example 1 - Function with Single Scalar Input
- Normal Function with Scalar
- Numpy Vectorized Function (
**np.vectorize()**) - Numba JIT Wrapped (
**jit()**) Function- First Execution
- Second Execution
- Execution with Different Data Type

- Loop-based and Numba JIT Decorated ((
**@jit**)) Function - Numba Vectorize Decorated (
**@vectorize**) Function - Numba Vectorize Decorated (
**@vectorize**) and Parallelized Function - Numba Vectorize Decorated (
**@vectorize**) and Cached Function

- Example 2 - Function with 2 Scalar Inputs

We'll start by importing necessary libraries (**numba and numpy**).

In [1]:

```
import numba
print("Numba Version : {}".format(numba.__version__))
```

In [2]:

```
import numpy as np
```

In this section, we'll create a simple function that works single scalar, vectorize it with numpy and numba. We'll then compare the performance of various methods.

Below we created a simple function that takes as input a single scalar value and evaluates formula **x^3 + 3x^2 + 3**. We'll be vectorizing this function to make it work on arrays of numbers using different methods and measure the performance of various methods.

In [3]:

```
def cube_formula(x):
return x**3 + 3*x**2 + 3
cube_formula(5)
```

Out[3]:

In this section, we have vectorized our **cube_formula()** function using **np.vectorize()** function. The **np.vectorize()** function takes as input any function and make it run on numpy array. The function wrapped inside of **np.vectorize()** will run faster compared to same function run as python loop through array.

In [4]:

```
vectorized_cube_formula = np.vectorize(cube_formula)
vectorized_cube_formula
```

Out[4]:

After vectorizing the function, we have created an array of **1M** numbers and called our vectorized function to execute on this array. We have also recorded the time taken by numpy vectorized function using **%%time** cell function of jupyter notebook. At last, we have printed a few values of the results.

If you are interested in learning about cell magic commands (like **%%time**) available in jupyter notebook then please feel free to check our tutorial on the same. It covers the majority of jupyter notebook magic commands.

In [5]:

```
arr = np.arange(1, 1000000, dtype=np.int64)
```

In [6]:

```
%%time
res = vectorized_cube_formula(arr)
```

In [7]:

```
res[:5]
```

Out[7]:

In this section, we have taken our **cube_formula()** function and wrapped it inside of **jit()** function available through **numba**. This is exactly the same as like we have decorated our original function with **@jit** decorator. The numba **@jit** decorator can be applied to any python function and it'll try to speed up the python function by using **LLVM** compiler. It'll try to speed up the whole function but if it's not able to convert the whole convert function then it'll at least try to speed up parts that it can convert to lower level machine code.

Our **cube_formula()** function is designed in a way that it can take as input numpy array as well and it'll work.

For detailed information about **jit()** function or **@jit** decorator, please feel free to check our tutorial on the same.

In [8]:

```
from numba import jit
cube_formula_jitted = jit(cube_formula)
```

After jit-wrapping our function, below we have run it using an array of **1M** entries which we had created earlier. We can notice that by just jit-wrapping the speed of function has improved. It has taken less time compared to the numpy vectorized version of the same function.

In [9]:

```
%%time
res = cube_formula_jitted(arr)
```

In [10]:

```
res[:5]
```

Out[10]:

Below we have executed our same function a second time with the same input array and this time it takes quite less amount of time which is quite a good improvement in speed.

In [11]:

```
%%time
res = cube_formula_jitted(arr)
```

In [12]:

```
res[:5]
```

Out[12]:

In this section, we have changed the data type of the input array from integer data type to float and then executed the jit-wrapped function using this new array. We can notice that the time taken by the jit-wrapped function is quite less compared to the numpy vectorized function.

In [13]:

```
arr = arr.astype(np.float64)
```

In [14]:

```
%%time
res = cube_formula_jitted(arr)
```

In [15]:

```
res[:5]
```

Out[15]:

In this section, we have first modified our **cube_formula()** function to make it work on an array. We have considered an input array as a sequence of numbers. We use a python loop to go through each element of the input sequence and calculate our cube formula on individual elements recording results in a different array. We have **@jit** wrapped this function to improve its performance.

In [16]:

```
from numba import jit
@jit(nopython=True)
def cube_formula_jitted(x):
xs = []
for i in x:
xs.append(i**3 + 3*i**2 + 3)
return xs
```

In this section, we have first converted our original **1M** numbers numpy array to an array of integers again. We have then executed our new cube formula function with this array and recorded the time taken by it.

We can notice from the results that the time taken for execution is less compared to both numpy vectorized and jit-wrapped functions. It seems that we have further improved our original cube formula function.

In [17]:

```
arr = arr.astype(np.int64)
```

In [18]:

```
%%time
res = cube_formula_jitted(arr)
```

In [19]:

```
res[:5]
```

Out[19]:

In this section, we have again executed our jit-decorated function using the same integer array to check whether it takes less time compared to the first run and we can notice from the results that it takes significantly less time compared to the previous run.

In [20]:

```
%%time
res = cube_formula_jitted(arr)
```

In [21]:

```
res[:5]
```

Out[21]:

In this section, we have first converted our array of integers to an array of floats. We have then executed our jit-decorated function with this array of floats. We can notice from the time taken by it that it takes less time compared to numpy vectorized and jit-wrapped functions.

In [22]:

```
arr = arr.astype(np.float64)
```

In [23]:

```
%%time
res = cube_formula_jitted(arr)
```

In [24]:

```
res[:5]
```

Out[24]:

In this section, we have decorated our cube formula function with **@vectorize** decorator. The **@vectorize** decorator requires us to specify possible data types of input and output of the function. It'll then create a compiled version for each data type. The data type should be in order from less memory data type to more memory data type. Below we have highlighted the signature of **@vectorize** decorator.

```
@vectorize([ret_datatype1(input1_datatype1,input2_datatype1,...), ret_datatype2(input1_datatype2,input2_datatype2,...), ...], target='cpu', cache=False)
def func(x):
return x*x
```

Apart from datatypes, it accepts two other arguments.

**target**- This argument accepts one of the below-mentioned three strings as input specifying how to further speed up code based on available resources.**'cpu'**- This is default argument. It's used for a single-core (single-threaded) CPU.**'parallel'**- This argument runs code in parallel on multi-core (multi-threaded) CPU.**'cuda'**- This argument is set for GPU.

**cache**- This parameter accepts boolean values specifying whether to use caching to speed up reruns of the same function again and again with the same inputs.

In [25]:

```
from numba import vectorize, int64, float32, float64
@vectorize([int64(int64), float32(float32), float64(float64)])
def cube_formula_numba_vec(x):
return x**3 + 3*x**2 + 3
```

In this section, we have executed our **@vectorize** decorated function with our **1M** elements array to check its performance. We can notice from the results that it easily outperforms all our previous trials (numpy vectorized, jit-wrapped, jit-decorated). The improvement in speed up is really big.

In [26]:

```
arr = arr.astype(np.int64)
```

In [27]:

```
%%time
res = cube_formula_numba_vec(arr)
```

In [28]:

```
res[:5]
```

Out[28]:

In this section, we have executed our vectorize decorate function a second time with the same array as input and we can notice that it takes even less time compared to the last execution.

In [29]:

```
%%time
res = cube_formula_numba_vec(arr)
```

In [30]:

```
res[:5]
```

Out[30]:

In this section, we have executed our vectorize decorated function with our big array by converting it from integer to float array. We can notice from the recorded time that the numba vectorize decorated function takes quite less time compared to all our previous trials.

In [31]:

```
arr = arr.astype(np.float64)
```

In [32]:

```
%%time
res = cube_formula_numba_vec(arr)
```

In [33]:

```
res[:5]
```

Out[33]:

In this section, we have decorated our cube formula function again with **@vectorize** decorator. But this time, we have set **target** parameter of the decorator to **'parallel'** to check whether using multi-threading improves our results further or not.

In [34]:

```
from numba import vectorize, int64, float32, float64
@vectorize([int64(int64), float32(float32), float64(float64)], target="parallel")
def cube_formula_numba_vec_paralleled(x):
return x**3 + 3*x**2 + 3
```

In this section, we have executed our vectorize-decorated and parallelized function with our big array of integers. We can notice from the results that it's almost the same as that of normal vectorize-decorated. The **'parallel'** target value does not seem to have improved results much. We recommend that you try using **'parallel'** keyword with your code to check whether it’s improving performance or not as we think that for much bigger arrays it might improve performance though it might not be visible in this example.

In [35]:

```
arr = arr.astype(np.int64)
```

In [36]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr)
```

In [37]:

```
res[:5]
```

Out[37]:

In this section, we have executed our vectorize decorated and parallelized function again with the same array to check whether the second run improves performance or not. From the results, we can notice that the time taken is almost the same as the first run hence not much improvement.

In [38]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr)
```

In [39]:

```
res[:5]
```

Out[39]:

In this section, we have executed our vectorize-decorated and parallelized function with our array of floats. We have converted the input array first to a float array from an integer array. We can notice from the results that there is not much improvement over the normal vectorize-decorated function.

In [40]:

```
arr = arr.astype(np.float64)
```

In [41]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr)
```

In [42]:

```
res[:5]
```

Out[42]:

In this section, we have again vectorize-decorated our cube formula function. We have also set **cache** argument of the function to **True** to check whether it helps in improving performance or not.

In [43]:

```
from numba import vectorize, int64, float32, float64
@vectorize([int64(int64), float32(float32), float64(float64)], cache=True)
def cube_formula_numba_vec_cached(x):
return x**3 + 3*x**2 + 3
```

Below we have executed our vectorize decorated and cached function with our array of **1M** integers. We can notice from the results that the time taken is almost the same as that of the normal vectorize decorated function.

In [44]:

```
arr = arr.astype(np.int64)
```

In [45]:

```
%%time
res = cube_formula_numba_vec_cached(arr)
```

In [46]:

```
res[:5]
```

Out[46]:

Below we have executed our function a second time with the same input to check whether there is any improvement. It seems from the results that the performance is almost the same as that of vectorize-decorated with **cache** set to **False**.

In [47]:

```
%%time
res = cube_formula_numba_vec_cached(arr)
```

In [48]:

```
res[:5]
```

Out[48]:

In this section, we have created another example of demonstrating the use of **@vectorize** decorator. We have created a new function that works on two inputs hence our vectorized functions will take two arrays as input. This example will help understand how we can vectorize functions with multiple inputs.

In this section, we have modified the definition of our cube formula function to accept 2 parameters instead of one. We have added one more parameter whose value replaces the constant value which we were adding to our formula. The new formula **x^3 + 3x^2 + y** now requires two parameters **x** and **y** as input.

In [49]:

```
def cube_formula(x, y):
return x**3 + 3*x**2 + y
```

In [50]:

```
cube_formula(5,3)
```

Out[50]:

In this section, we have vectorized our new cube formula function using **np.vectorize()** function.

After vectorizing, we have created two input arrays of size **1M** each and executed our function with these two arrays as input for **x** and **y**. We have also recorded the time taken by the function to execute using **%%time** magic command. We'll be comparing this output with various numba decorated functions to check whether numba decorators (**@jit and @vectorize**) are speeding up this function or not.

In [51]:

```
vectorized_cube_formula = np.vectorize(cube_formula)
vectorized_cube_formula
```

Out[51]:

In [52]:

```
arr = np.arange(1,1000001, dtype=np.int64)
ys = np.random.randint(1, 10, size=1000000, dtype=np.int64)
```

In [53]:

```
%%time
res = vectorized_cube_formula(arr, ys)
```

In [54]:

```
res[:5]
```

Out[54]:

In this section, we have wrapped our cube formula function inside of **jit()** function provided by numba. Our function is designed in a way that it can work on an array as well.

In [55]:

```
from numba import jit
cube_formula_jitted = jit(cube_formula)
```

In this section, we have executed our jit-wrapped function with two input arrays that we had created earlier. We can notice from the run time that it has taken quite less time to run compared to the numpy vectorized version. By just wrapping our function inside of **jit()**, we get such good speed improvement. We'll try other things in upcoming sections to check whether we can further improve results or not.

In [56]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [57]:

```
res[:5]
```

Out[57]:

In this section, we have executed our jit-wrapped function again with the same arrays as input to check whether there is any improvement in speed if we run the function with the same parameters again and again. From the results recorded, we can notice that it takes quite less time to run compared to the last run.

In [58]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [59]:

```
res[:5]
```

Out[59]:

In this section, we have changed the data type of our input arrays data type from integer to float. We have then executed our jit-wrapped function with these float arrays to check the time taken. We can notice that the time taken for float arrays is even less than that taken by integer arrays.

In [60]:

```
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
```

In [61]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [62]:

```
res[:5]
```

Out[62]:

In this section, we have modified our cube formula function to work on the input of arrays. We have modified it to consider both inputs as arrays. We then loop through each element of input arrays, evaluate our cube formula and record the result in a different array. At last, we return the array in which results were stored. We have decorated our function with **@jit** decorator to speed it up.

In [63]:

```
from numba import jit
@jit(nopython=True)
def cube_formula_jitted(x, y):
xs = []
for i,j in zip(x,y):
xs.append(i**3 + 3*i**2 + j)
return xs
```

In this section, we have executed our function using two input arrays of integers which we had created earlier. We can notice from the recorded time that it takes less time compared to the numpy vectorized and jit-wrapped functions. We have further speed up our function with these changes.

In [64]:

```
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
```

In [65]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [66]:

```
res[:5]
```

Out[66]:

In this section, we have executed our jit-decorated function again with the same parameters as input to check whether the second run with the same input takes less time or more to execute. From the results, we can notice that the second run with the same parameters takes quite less time compared to the first run.

In [67]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [68]:

```
res[:5]
```

Out[68]:

In this section, we have executed our jit-decorated function with inputs converted to float data type. We can notice from the results that the time taken is almost the same as that taken by the jit-wrapped function.

In [69]:

```
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
```

In [70]:

```
%%time
res = cube_formula_jitted(arr, ys)
```

In [71]:

```
res[:5]
```

Out[71]:

In this section, we have decorated our cube formula function with **@vectorize** decorator to check whether we can further improve performance using this decorator. Please take a look at the signature of the data type provided inside of **@vectorize** decorator. They are two entries inside of parenthesis because we have two input arrays.

In [72]:

```
from numba import vectorize, int64, float32, float64
@vectorize([int64(int64,int64), float32(float32,float32), float64(float64,float64)])
def cube_formula_numba_vec(x, y):
return x**3 + 3*x**2 + y
```

In this section, we have executed our vectorize-decorated function with two arrays of an integer as input and recorded the run time of it. We can notice that the time taken is quite less compared to all our previous trials (numpy vectorize, jit-wrapped, and jit-decorated). This is a significant improvement in speed by just decorating our function with **@vectorize** decorator.

In [73]:

```
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
```

In [74]:

```
%%time
res = cube_formula_numba_vec(arr,ys)
```

In [75]:

```
res[:5]
```

Out[75]:

In this section, we have again executed our function with the same inputs to check whether the second run is faster compared to the first run. The results are even better compared to the first run.

In [76]:

```
%%time
res = cube_formula_numba_vec(arr,ys)
```

In [77]:

```
res[:5]
```

Out[77]:

In this section, we have first modified the data type of our input arrays from integer to float. We have then executed our vectorize-decorated function with these float arrays. We can notice from the recorded time that it took quite less time compared to all our previous trials. The speedup is significant and noticeable.

In [78]:

```
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
```

In [79]:

```
%%time
res = cube_formula_numba_vec(arr,ys)
```

In [80]:

```
res[:5]
```

Out[80]:

In this section, we have decorated our cube formula function with **@vectorize** decorator but we also have set **target** parameter to **'parallel'** to check whether using multi-threading improves the performance further or not.

In [81]:

```
from numba import vectorize, int64, float32, float64
@vectorize([int64(int64,int64), float32(float32,float32), float64(float64,float64)], target="parallel")
def cube_formula_numba_vec_paralleled(x, y):
return x**3 + 3*x**2 + y
```

In this section, we have recorded the time taken by the vectorize-decorated function to check whether there is any speed up with parallelizing. The results are almost the same as that of the non-parallelized version. Though the results have not improved in our example, we recommend that you try parallelized version once to check whether it’s improving results in your case or not. The multi-threading adds some overhead to processing but with large data, it can be ignored if it's running faster compared to single-threaded runs.

In [82]:

```
arr = arr.astype(np.int64)
ys = ys.astype(np.int64)
```

In [83]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr,ys)
```

In [84]:

```
res[:5]
```

Out[84]:

In this section, we have executed our vectorized function again with the same inputs to check whether there is any improvement in speed up but the results are almost the same as the last run.

In [85]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr,ys)
```

In [86]:

```
res[:5]
```

Out[86]:

In this section, we have run our vectorized and parallelized cube formula function with inputs of float data type. We can notice from the results that the results are almost the same as previous runs without parallelizing.

In [87]:

```
arr = arr.astype(np.float64)
ys = ys.astype(np.float64)
```

In [88]:

```
%%time
res = cube_formula_numba_vec_paralleled(arr,ys)
```

In [89]:

```
res[:5]
```

Out[89]:

This ends our small tutorial explaining how we can use numba **@vectorize** decorator to translate a function working on scalars to function working on arrays. We also discussed speed up provided by **@vectorize** decorator. Please feel free to let us know your views in the comments section.

- Creating NumPy universal functions
- numba - Make Your Python Functions Run Faster Like C/C++
- Numba @stencil Decorator: Guide to Improve Performance of Code involving Stencil Kernels
- Numba @guvectorize Decorator: Generalized Universal Functions
- How to Speed up Code involving Pandas DataFrame using Numba?

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs