JIT compiler
for python. It translates Python code into fast machine code which can be re-run again and again fast once it's compiled and converted to machine code.Numba
can only translate a certain subset of python code which involves loops
and code involving numpy
to faster machine code. Not everything will be running faster using numba. One needs to have basic knowledge of what can be parallelized and what not to make efficient use of numba.numba
provides a list of decorators which when applied to functions results in faster versions of that function.Nvidia GPUs
and AMD ROC GPUs
. One needs to install Graphics drivers for particular GPUs. It'll also require the cudatoolkit
library as well for Nvidia(conda install cudatoolkit
). For AMD, it'll require roctools
library(conda install -c numba roctools
).Note: Python is interpreter based language and numba is JIT compiler of python. Compiler based languages are faster than interpreter based.
Python 2.7
and 3.5 or later
, and Numpy versions 1.7 to 1.15
.pip install numba
conda install numba
import numpy as np
import matplotlib.pyplot as plt
import numba
import pandas as pd
from numba import jit
%matplotlib inline
for lib in dir():
lib = eval(lib)
if '__version__' in dir(lib): print(lib.__name__ + ' : '+ lib.__version__)
def get_squares():
return [ pow(i,2) for i in range(int(1e7))]
%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
@jit
which clearly improves performance compared to normal python code.@jit
decorator and fall back to using pure python and find out some other ways to improve performance. Because using @jit
decorator with functions that can't be converted to numba might result in worsening performance as it'll take time to compile function first time and will result in no performance improvement. Hence time taken first time to compile function to convert will add up overhead. numba
generally does not work much faster with list-comprehensions (though below case seems bit exception) and it is suggested to fallback and convert function using comprehensions to loop-based again for faster performance.@jit
def get_squares():
return [ pow(i,2) for i in range(int(1e7))]
%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
@jit
def get_squares():
squares = []
for i in range(int(1e7)):
squares.append(pow(i,2))
return squares
%time squares = get_squares()
%time squares = get_squares()
%time squares = get_squares()
apply
which applies a function to all cells of pandas dataframe which if uncommented will result in failure with @jit
decoratornumba
involving only pandas
code will not
results in improving performance. It can even backfire and can take time to run the first time as seen below. Because it tried to convert code to numba for improving performance but it failed and fall backed to pure python
at last.def work_on_dataframe():
data = {'Col1': range(1000), 'Col2': range(1000), 'Col3': range(1000)}
df = pd.DataFrame(data=data)
#df_square = df.apply(lambda x: x*x*x)
df['Col1'] = (df.Col1 * 100)
df['Col2'] = (df.Col1 * df.Col3)
df = df.where((df > 100) & (df < 10000))
df = df.dropna(how='any')
return df
%time df = work_on_dataframe()
%time df = work_on_dataframe()
%time df = work_on_dataframe()
@jit
def work_on_dataframe():
data = {'Col1': range(1000), 'Col2': range(1000), 'Col3': range(1000)}
df = pd.DataFrame(data=data)
#df_square = df.apply(lambda x: x*x*x)
df['Col1'] = (df.Col1 * 100)
df['Col2'] = (df.Col1 * df.Col3)
df = df.where((df > 100) & (df < 10000))
df = df.dropna(how='any')
return df
%time df = work_on_dataframe()
%time df = work_on_dataframe()
%time df = work_on_dataframe()
def calculate_sum():
arr = np.arange(1e7)
s = np.sum(arr)
return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
@jit
def calculate_sum():
arr = np.arange(1e7)
s = np.sum(arr)
return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
@jit
def calculate_sum():
s = 0
for i in np.arange(1e7):
s += i
return s
%time s = calculate_sum()
%time s = calculate_sum()
%time s = calculate_sum()
nopython
attribute to @jit
decorator¶@jit
compile works in 2 modes.
nopython
object
nopython mode
is generally preferred over object mode
and way faster than it if it can be used.
nopython mode
.object mode
.nopython mode
first and if it works then use that mode otherwise fall back to object mode
.If function fails with nopython mode
then it'll require to use object mode
.def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
@jit
def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
@jit(nopython=True)
def calculate_all_permutations():
perms = []
for i in range(int(1e4)):
for j in range(int(1e3)):
perms.append((i,j))
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()
%time perms = calculate_all_permutations()