**Numba** is one of the most commonly used libraries nowadays to speed-up python code. It can speed up your existing python code by a big margin by simply decorating your existing functions with numba decorators. **Numba** provides various decorators to speed up the python code. Below is a list of some of the decorators on which we have already covered tutorials explaining decorators in depth.

**@jit &@njit**- Decorators to speed up almost any python function.**@vectorize**- Decorator to speed up numpy-like universal functions.**@guvectorize**- Decorator which is an extended version of**@vectorize**decorator.**@stencil**- Decorator which speeds up function performing stencil kernel operations like convolution, correlation, etc.

As a part of this tutorial, we'll be concentrating on **@stencil** kernel. The stencil kernel is a kind of numerical computation where each element of the array is updated according to some fixed pattern. This fixed pattern can be performing operations on each array element based on neighbors of the same array or different array. One common example of a stencil kernel is a convolution operation where we convolve a fixed dimension kernel on each element of the array. **Numba** let us speed up these functions which perform stencil operations on input arrays using **@stencil** decorator.

Below we have highlighted important sections of the tutorial.

- Example 1: Simple Example with 1-Dimensional Array
- Example 2: Working with 2-Dimensional Array
- Example 3: Constant Value as One of the Input
- Example 4: Working with More than One Arrays as Input
- Example 5: Using neighborhood Argument for Rolling Operations
- Example 6: Specifying Constant Value for Border Cases
- Example 7: Using Standard Indexing on Specified Arrays
- Example 8: Specifying Output Array Explicitly

Below we have imported **Numba** and printed the version of it. We have also imported **@stencil** decorator.

In [1]:

```
import numba
print("Numba Version : {}".format(numba.__version__))
```

In [2]:

```
import numpy as np
```

In [3]:

```
from numba import stencil
```

As a part of our first example, we'll explain how we can create a stencil kernel that works with **1D** array. We'll then decorate it with **Numba @stencil** decorator to speed up the execution.

Below we have created a function that takes **1D** array as input and creates a new **1D** array where all elements are replaced by the addition of the element and its neighbors. Our function takes as input 2 **1D** arrays. The first array will have data and 2nd array will be empty which will be filled with the result. We have put an if-else condition to handle boundary conditions.

In [3]:

```
def conv_op(a, b):
for i in range(a.shape[0]):
if i-1 < 0 or i+1 >= a.shape[0]:
b[i] = 0
else:
b[i] = a[i-1] + a[i] + a[i+1]
```

In the below cell, we have created a **1D** array of **1M** numbers and applied our function to it. We have then recorded the time taken by function to execute using **%time** magic command of jupyter notebook. It records the time of a single python statement. We have then printed the first few elements of the resulted array as well.

If you want to know about other jupyter notebook magic commands like **%time** then please feel free to check our tutorial on the same which covers the majority of the commands available.

In [4]:

```
input_arr = np.arange(1_000_000)
output_arr = np.empty_like(input_arr)
%time conv_op(input_arr,output_arr)
```

In [5]:

```
output_arr[:10]
```

Out[5]:

In the below cell, we have re-created our function again using **@stencil** decorator. We have already imported **stencil** from **Numba** earlier. We have removed if-else conditions needed to handle boundary conditions because now boundary conditions will be handled by **@stencil** decorator itself. We also don't need to provide a second array as **Numba** will internally create a new array and return it. We only need to provide logic to apply stencil kernel. Internally, **@stencil** decorator will loop through all elements of the input array and apply the stencil kernel logic on all of them. It'll then return the final array with the result. In the below function, **'a[-1], a[0] and a[1]'** elements refers to **'a[i-1],a[i] and a[i+1]'** elements where **'i'** will change from 0 till end index of the array. So for each element, we'll add its previous and next elements to it. When handling boundary conditions for first and last elements where there is no previous and next element, it'll evaluate it to 0.

Then, in the next cell, we have executed our **@stencil** decorated function 3 times and recorded the time taken by it each time. We have also stored the output array and printed it for comparison with the result of the original python function without a decorator. We can notice from the time that it takes quite less time to execute **@stencil** decorated function compared to the python function without it.

When designing a function that applies stencil kernel to the input array, we just need to include kernel logic that will be applied to the input array. We don't need to worry about boundary conditions. We can also provide more than one array to stencil function which we'll explain in our upcoming examples.

In [13]:

```
@stencil
def conv_op(a):
return a[-1] + a[0] + a[1]
```

In [57]:

```
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
```

In [59]:

```
output_arr[:10]
```

Out[59]:

In this example, we have explained how we can create a stencil kernel function that works on **2D** array. Below we have created a python function that loops through all elements of the input **2D** array. It then adds the elements that come in the same column but one row before and after it and elements that come in the same row but one column before and after it. It adds elements that are on a plus sign formed in an array on the given element. Our function adds results to another array given as input to the function.

In the next cell, we have first created an array of **(1000,1000)** integers and executed our python function on it. We have also recorded the time taken by the function.

In [86]:

```
def conv_op(a, b):
for i in range(a.shape[0]):
for j in range(a.shape[1]):
if i+1 == a.shape[0] or j+1 == a.shape[1]:
b[i,j] = 0
elif i-1 < 0 or j-1 < 0:
b[i,j] = 0
else:
b[i,j] = a[i, j+1] + a[i+1, j] + a[i, j-1] + a[i-1, j]
```

In [87]:

```
input_arr = np.arange(1000_000).reshape((1000, 1000))
input_arr[:5,:5]
```

Out[87]:

In [89]:

```
output_arr = np.empty_like(input_arr)
%time conv_op(input_arr, output_arr)
output_arr[:5,:5]
```

Out[89]:

In the below cell, we have recreated the function but using **Numba @stencil** decorator. We can now remove boundary conditions check and loop as it'll be handled by **Numba** internally.

After creating the function, we have executed it three times with the array we created in the previous cell. We also recorded the time taken by function to compare it with the normal python function we created earlier. We can notice from the results that **@stencil** decorated function performs quite better compared to our normal python function.

In [90]:

```
from numba import stencil
@stencil
def conv_op(a):
return a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0]
```

In [92]:

```
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
output_arr[:5,:5]
```

Out[92]:

Both of our previous examples demonstrate usage of **@stencil** decorator took single array as input and worked on it. We can also provide another array of scalar as input to the function. In this example, we'll explain how we can give scalar as input to function.

Below we have first created a simple python function that takes three arguments as input. The first argument is an array, the second argument is scalar and the third argument is again an array of the same shape as the first argument. We then loop through each element of the first array and add its neighbors to it like our first example. This time after adding neighbors, we are also multiplying it with a scalar. We are storing results in another input array.

In the next cell, we have created an array of **1M** integers and initialized scalar value to be given as input to function. Then, we have executed our function with this array and scalar value recording time taken for the execution.

In [188]:

```
def conv_op(a, b, c):
for i in range(a.shape[0]):
if i-1 < 0 or i+1 >= a.shape[0]:
c[i] = 0.0
else:
c[i] = b * (a[i-1] + a[i] + a[i+1])
```

In [189]:

```
input_arr1 = np.arange(1000_000)
input2 = 0.25
```

In [190]:

```
output_arr = np.empty_like(input_arr1)
%time conv_op(input_arr1, input2, output_arr)
output_arr[:10]
```

Out[190]:

In the below cell, we have recreated the function using **@stencil** decorator. Our function now takes two inputs where the first argument is an array and the second argument is a scalar value.

Then, in the next cell, we have executed our function three times on an array from the previous cell and recorded the time taken by it. We can notice that it outperforms the original python function without a decorator by a big margin.

In [191]:

```
@stencil
def conv_op(a, b):
return b * (a[-1] + a[0] + a[+1])
```

In [193]:

```
%time conv_op(input_arr1, input2)
%time conv_op(input_arr1, input2)
%time conv_op(input_arr1, input2)
output_arr[:10]
```

Out[193]:

All our previous examples worked on single input arrays but we can create a stencil kernel function that works on more than one array. In this example, we'll create an example that works on two input arrays.

Below we have created a python function that takes as input three input **1D** arrays of the same size. It then loops through each element of the first and second array, takes the average of elements at the same index, previous index, and next index. It then adds all averaged elements and stores results in the third array. The logic is almost the same as our first example with the only change being that it's taking an average of two array elements this time.

Then in the next cell, we have created two **1D** arrays of **1M** elements and execute our function on them. We have also recorded the time taken for execution.

In [4]:

```
def conv_op(a, b, c):
for i in range(a.shape[0]):
if i-1 < 0 or i+1 >= a.shape[0]:
c[i] = 0.0
else:
c[i] = (a[i-1] + b[i-1])/2 + (a[i]+b[i])/2 + (a[i+1]+b[i+1])/2
```

In [5]:

```
input_arr1 = np.arange(1000_000)
input_arr2 = np.ones_like(input_arr1)
```

In [6]:

```
output_arr = np.empty_like(input_arr1, dtype=np.float32)
%time conv_op(input_arr1, input_arr2, output_arr)
output_arr[:10]
```

Out[6]:

In the below cell, we have reimplemented the function using **@stencil** decorator. We have created a function that takes two **1D** arrays as input. We have then executed the function three times and recorded the time taken for execution. We can notice from the time recorded that it takes quite less time for **@stencil** decorated function to execute compared to the normal python function without it.

In [197]:

```
@stencil
def conv_op(a, b):
return (a[-1]+b[-1])/2 + (a[0]+b[0])/2 + (a[1]+b[1])/2
```

In [198]:

```
input_arr1 = np.arange(1000_000)
input_arr2 = np.ones_like(input_arr1)
%time output_arr = conv_op(input_arr1, input_arr2)
%time output_arr = conv_op(input_arr1, input_arr2)
%time output_arr = conv_op(input_arr1, input_arr2)
output_arr[:10]
```

Out[198]:

In this example, we'll explain how we can use **neighborhood** argument of **@stencil** decorator to work on many elements of the input array. This can be useful in situations when we want to perform rolling window functions like rolling average over 5 days, rolling standard deviation over 10 days, etc.

Below we have created a function that takes as input two **1D** arrays. It then loops through elements of the first array one by one. It takes an average of elements and four previous elements before it. We have given another array of the same shape as the first array in which results will be recorded.

In [247]:

```
def rolling_mean(a, b):
for i in range(a.shape[0]):
idx1, idx2, idx3, idx4, idx5 = i-4,i-3,i-2,i-1,i
elem1 = np.nan if idx1 < 0 else a[idx1]
elem2 = np.nan if idx2 < 0 else a[idx2]
elem3 = np.nan if idx3 < 0 else a[idx3]
elem4 = np.nan if idx4 < 0 else a[idx4]
elem5 = a[idx5]
b[i] = (elem1+elem2+elem3+elem4+elem5) / 5
```

In [248]:

```
input_arr = np.linspace(0, 500, num=1_000_000)
output_arr = np.empty_like(input_arr)
```

In [249]:

```
%time rolling_mean(input_arr, output_arr)
output_arr[:10]
```

Out[249]:

In the below cell, we have recreated the python function using **@stencil** decorator. We have provided argument **neighborhood** to **@stencil** decorator with the value of **(-4,0)** which suggests indexing to go from -4 to 0. Inside of the function, we are looping from -4 to 0 and adding elements of the array. We are then dividing the total by 5 to calculate the average. This function will be executed for each array element taking an average of the current element and four previous elements.

In the next cell, we have executed our **@stencil** decorated function three times with an array created earlier and recorded time for each execution. We can notice that it takes quite less time for **@stencil** decorated function to execute.

In [250]:

```
@stencil(neighborhood=((-4,0), ))
def rolling_mean(a):
tot = 0.0
for i in range(-4,1):
tot += a[i]
return tot / 5
```

In [251]:

```
%time output_arr = rolling_mean(input_arr)
%time output_arr = rolling_mean(input_arr)
%time output_arr = rolling_mean(input_arr)
output_arr[:10]
```

Out[251]:

Till now, for all our examples, we were not able to specify boundary values for boundary conditions. It was set to 0 internally by **Numba** but we can specify constant value for border cases using **func_or_mode** and **cval** parameters of **@stencil** decorators. In this example, we'll explain how to use them.

Below we have recreated our function from the first example, where we are setting boundary values to **-1** instead of **0** which we were setting till now. We can notice from the results when we run the below function.

In [272]:

```
def conv_op(a, b):
for i in range(a.shape[0]):
if i-1 < 0 or i+1 >= a.shape[0]:
b[i] = -1
else:
b[i] = a[i-1] + a[i] + a[i+1]
```

In [273]:

```
input_arr = np.arange(1_000_000)
output_arr = np.empty_like(input_arr)
%time conv_op(input_arr, output_arr)
output_arr
```

Out[273]:

In the below cell, we have created our python function using **@stencil** decorator. We have provided a two-parameter to stencil kernel. The value of **func_or_mode** parameter is set to **'constant'** and **cval** parameter is set to **-1**. This will inform **numba** to use this constant value for boundary conditions.

In [274]:

```
@stencil(func_or_mode="constant", cval=-1)
def conv_op(a):
return a[-1] + a[0] + a[1]
```

In [275]:

```
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
%time output_arr = conv_op(input_arr)
output_arr
```

Out[275]:

As we noticed through all our examples till now, **@stencil** decorator internally uses relative indexing to loop through all elements and apply stencil kernel function to them. But there can be situations where we have given more than one array as input and we want to use standard indexing on some input arrays rather than using relative indexing for all. By default, **@stencil** decorator uses relative indexing for all input arrays. In this example, we'll explain how we can use standard indexing with **@stencil** decorator.

In the below cell, we have created a function that takes three arrays as input. The first and last arrays are of the same size whereas the second array has three elements. We have modified our function from the first example. We are looping through each element of the first array and multiplying the current element with the second element of the second array, a previous element with the first element of the second array, and the next element with the third element of the second array. We are storing the result in the third array. In this case, the second array is not following the same indexing as the first array.

In the next cell, we have executed the function on the input array of **1M** numbers and the second array of three elements. We have also recorded the time taken by the function.

In [310]:

```
def conv_op(a, b, c):
for i in range(a.shape[0]):
if i-1 < 0 or i+1 >= a.shape[0]:
c[i] = 0
else:
c[i] = (b[0] * a[i-1]) + (b[1] * a[i]) + (b[2] * a[i+1])
```

In [312]:

```
input_arr = np.arange(1_000_000)
weights = np.array([0.25, 0.5, 0.75])
output_arr = np.empty_like(input_arr, dtype=np.float64) ## Please make a note of data type change here.
%time conv_op(input_arr, weights, output_arr)
output_arr[:10]
```

Out[312]:

In the below cell, we have recreated our earlier python function using **@stencil** decorator. This time we have provided parameter **standard_indexing** to **@stencil** decorator with value **('b',)** telling it to use standard indexing for array **b** which is the second input array to function.

We have then executed our **@stencil** decorated function three times and recorded the time taken for each execution. We can notice from the results that it performs quite better compared to the normal loop-based python function.

In [313]:

```
@stencil(standard_indexing=("b",))
def conv_op(a,b):
return b[0] * a[-1] + b[1] * a[0] + b[2] * a[1]
```

In [314]:

```
input_arr = np.arange(1_000_000)
weights = np.array([0.25, 0.5, 0.75])
%time output_arr = conv_op(input_arr, weights)
%time output_arr = conv_op(input_arr, weights)
%time output_arr = conv_op(input_arr, weights)
output_arr[:10]
```

Out[314]:

Till now, all our **@stencil** decorated functions returned the output array after performing stencil kernel on the input array. We can also create an array by ourselves and ask the results to be stored in it. We need to provide output array to **out** argument when calling **@stencil** decorated function and it'll add results to that array rather than returning it. Below we have explained with a simple example how we can use **out** argument.

In [294]:

```
@stencil
def conv_op(a):
return a[-1] + a[0] + a[1]
```

In [295]:

```
input_arr = np.arange(1_000_000)
output_arr = np.empty_like(input_arr)
%time conv_op(input_arr, out=output_arr)
output_arr[:10]
```

Out[295]:

This ends our small tutorial explaining how we can use **@stencil** decorator of **Numba** to perform stencil kernel on a given input array. Please feel free to let us know your views in the comments section.

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki

Numba @guvectorize Decorator: Generalized Universal Functions

Simple Guide to Understand Pandas Multi-Level / Hierarchical Index

xarray (Dataset) : Multi-Dimensional Labelled Arrays

Pandas query(): Query Pandas DataFrame using Python Expressions