Share @ Google LinkedIn Facebook  python, optimisation

Overview of Numba:

  • According to numba official site, it's high performance JIT compiler for python. It translates python code into fast mahine code which can be re-run again and again fast once it's compiled and converted to machine code.
  • Numba can only translate certain subset of python code which involves loops and code involving numpy to faster machine code. Not everything will be running faster using numba. One needs to have basic knowledge on what can be parallelalized and what not to make efficient use of numba.
  • numba provides list of decorators which when applied to functions results in faster versions of that function.
  • When a function decorated with numba decorators is called, it'll be compiled first to generate faster machine code hence it'll take almost same time as python code or may be more in worst case. Once code is compiled then recalling such function will be way faster because compiled version will be called subsequently.
  • Numba reads python bytecode of function covered with numba decorator, converts it's input arguments and other data used inside function to numba datatypes, optimizes various parts and converts it to machine code using LLVM library.
  • If function is designed to use with various data types (generic functions) then numba will take time to compile function each time function is called with new data type it hasn't seen before. Because it'll be creating different compilation of same generic function with different datatypes.
  • Numba can help improving performance of code running on GPUs as well. It supports Nvidia GPUs and AMD ROC GPUs. One needs to install Graphics drivers for particular GPUs. It'll also require cudatoolkit library as well for Nvidia(conda install cudatoolkit). For AMD, it'll require roctools library(conda install -c numba roctools).

Note: Python is interpreter based laguage and numba is JIT compiler of python. Compiler based languages are faster than interpreter based.

Compatibility

  • Numba is compatible with Python 2.7 and 3.5 or later, and Numpy versions 1.7 to 1.15.

Installing Numba:

  • pip install numba
  • conda install numba

Importing & Using Numba:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import numba
import pandas as pd
from numba import jit

%matplotlib inline
In [2]:
for lib in dir():
    lib  = eval(lib)
    if '__version__' in dir(lib): print(lib.__name__ + ' : '+ lib.__version__)
numpy : 1.14.3
numba : 0.40.0
pandas : 0.23.0

Runing code without numba

In [3]:
def get_squares():
    return [ pow(i,2) for i in range(int(1e7))]

%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
Wall time: 6.26 s
Wall time: 6.46 s
Wall time: 6.34 s

Running code with numba decorator

  • Below we have introduced first decorator of numba @jit which clearly improves performance compared to normal python code.
  • If adding numba decorators during testing does not seem to improve performance then it's better to removed @jit decorator and fall back to using pure python and find out some other ways to improve performance. Because using @jit decorator with functions which can't be covnerted to numba might result in worsening performance as it'll take time to compile function first time and will result in no performance improvement. Hence time taken first time to compile function to convert will add up overhead.
  • numba generally does not works much faster with list-comprehensions (though below case seems bit exception) and it is suggested to fallback and convert function using comprehensions to loop based again for faster performance.
In [4]:
@jit
def get_squares():
    return [ pow(i,2) for i in range(int(1e7))]

%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
Wall time: 2.06 s
Wall time: 1.31 s
Wall time: 1.22 s
In [5]:
@jit
def get_squares():
    squares = []
    for i in range(int(1e7)):
        squares.append(pow(i,2))
    return squares

%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
Wall time: 1.57 s
Wall time: 1.36 s
Wall time: 1.17 s

Numba does not improve code involving pandas dataframe eventhough pandas is built on numpy

  • Below example has line which uses pandas function apply which applies function to all cells of pandas dataframe which if uncommented will result in failure with @jit decorator
  • Below examples shows that using numba involving only pandas code will not reults in improving performance. It can even backfire and can take time to run first time as seen below. Because it tried to convert code to numba for improving performance but it failed and fall backed to pure python at last.
In [6]:
def work_on_dataframe():
    data = {'Col1': range(1000), 'Col2': range(1000), 'Col3': range(1000)}
    df = pd.DataFrame(data=data)
    #df_square = df.apply(lambda x: x*x*x)
    df['Col1'] = (df.Col1 * 100)
    df['Col2'] = (df.Col1 * df.Col3)
    df = df.where((df > 100) & (df < 10000))
    df = df.dropna(how='any')
    return df
    
%time df = work_on_dataframe()
%time df = work_on_dataframe()
%time df = work_on_dataframe()
Wall time: 173 ms
Wall time: 0 ns
Wall time: 0 ns
In [7]:
@jit
def work_on_dataframe():
    data = {'Col1': range(1000), 'Col2': range(1000), 'Col3': range(1000)}
    df = pd.DataFrame(data=data)
    #df_square = df.apply(lambda x: x*x*x)
    df['Col1'] = (df.Col1 * 100)
    df['Col2'] = (df.Col1 * df.Col3)
    df = df.where((df > 100) & (df < 10000))
    df = df.dropna(how='any')
    return df
    
%time df = work_on_dataframe()
%time df = work_on_dataframe()
%time df = work_on_dataframe()
Wall time: 337 ms
Wall time: 0 ns
Wall time: 15.6 ms
In [8]:
def calculate_sum():
    arr = np.arange(1e7)
    s = np.sum(arr)
    return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
Wall time: 109 ms
Wall time: 78 ms
Wall time: 104 ms
In [9]:
@jit
def calculate_sum():
    arr = np.arange(1e7)
    s = np.sum(arr)
    return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
Wall time: 450 ms
Wall time: 78 ms
Wall time: 78 ms
In [10]:
@jit
def calculate_sum():
    s = 0
    for i in np.arange(1e7):
        s += i
    return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
Wall time: 173 ms
Wall time: 78 ms
Wall time: 78 ms

nopython attribute to @jit decorator

  • @jit compile works in 2 modes.

    1. nopython
    2. object
  • nopython mode is generally preffered over object mode and way faster than it if it can be used.

  • If you know that your whole function can be converted using numba then it's preffered to use nopython mode.
  • If your function is designed in a way that some parts of it can be covnerted to numba and some will run in pure python then it's preferred to run in object mode.
  • Users can test whether their function can run with nopython mode first and if it works then use that mode otherwise fall back to object mode.If function failes with nopython mode then it'll require to use object mode.
In [12]:
def calculate_all_permutations():
    perms = []
    for i in range(int(1e4)):
        for j in range(int(1e3)):
            perms.append((i,j))

%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
Wall time: 3.31 s
Wall time: 3.3 s
Wall time: 3.25 s
In [15]:
@jit
def calculate_all_permutations():
    perms = []
    for i in range(int(1e4)):
        for j in range(int(1e3)):
            perms.append((i,j))

%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
Wall time: 1.18 s
Wall time: 1.13 s
Wall time: 1.02 s
In [16]:
@jit(nopython=True)
def calculate_all_permutations():
    perms = []
    for i in range(int(1e4)):
        for j in range(int(1e3)):
            perms.append((i,j))

%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
Wall time: 1.14 s
Wall time: 1.02 s
Wall time: 995 ms
In [ ]:


Let other learners know about this article @ Google LinkedIn Facebook
Sunny Solanki  Sunny Solanki