cProfile
& profile
¶The profiling of code is a process where you find out which particular part of the code is taking how much time. This can be useful when optimizing code which is taking more time than expected. The profiling can be of time or even it can be of memory as well sometimes. There are profilers which let us know memory usage at each statement of code. We'll be concentrating on profilers for measuring the time of various parts of code as a part of this tutorial. The python has two profilers for this purpose.
Both of the above-mentioned profilers provide deterministic profiling
of python code. The deterministic profiling monitors each statement, function call, exceptions, etc. The other profiling technique is called statistical profiling
technique which randomly samples instruction being executed and records time to determine where more time is being spent. As statistical profiling does not monitor each statement, function calls, etc and randomly samples which part to monitor, it adds less overhead of profiling a task.
As a part of this tutorial, we'll be explaining how to use cProfile
and profile
modules to profile Python code. We'll also cover the pstats
module which can be used to format the output of profilers. We'll be explaining the usage of both libraries with few examples. If you are interested in learning about other profilers available in python then please feel free to go through the references section to check out tutorials on other profilers.
cProfile
¶import cProfile
run()
function¶The cProfile has a function named run()
which accepts any python statement. It'll pass that python statement to exec()
function to execute and generate profiling results as well.
Below we are simply running a loop 10000 time and generating a random numbers between 1-10 each time. We keep all random numbers in a list and then sum them up. We have passed this statement to the run()
method of cProfile to see profiling results of where time is getting spent.
import random
cProfile.run("sum([random.randint(1,10) for i in range(10000)])")
We can understand from the above output that a total of 55957 function calls was made. The Ordered by: standard name
indicates that the table displayed is sorted by string name printed at last.
The meaning of each column in the table is explained below:
tottime
divided by ncalls
.cumtime
divided by ncalls
.The run()
function also provides another parameter named filename
to which we can provide filename where the output of profiling will be saved. We'll load this file later using the pstats
module.
cProfile.run("sum([random.randint(1,10) for i in range(10000)])", filename="out.prof")
Profile()
Constructor¶The run()
method explained in the previous example has the limitation that we need to pass a single statement to it which needs to be profiled. If we want more control over which part of the code should be profiled then Profile()
constructor is a better option. It has a function named enable()
and disable()
which lets us enable and disable profiler so that only a particular part of the code can be profiled. This can help us avoid unnecessarily profile whole code when we need to profile only certain parts of code.
Below we have created two function slow_add()
and fast_add()
which takes two numbers and returns their sums. The slow_add()
function has a sleep time of half-second so that it takes more time to perform sum. We have then created the main_func()
function where we are looping through numbers from 0-9
and calling slow_add()
if the number is even else calling fast_add()
passing the same number two times as parameters. We are adding the output of function calls to the arr
list and returning it as well. We have declared our profiler outside of main_func()
and we are enabling it before the execution of the loop and disabling after the execution of the loop. This will prevent the profiling of other parts of the function.
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
prof = cProfile.Profile()
def main_func():
arr = []
prof.enable()
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
prof.disable()
return arr
main_func()
Once profiling is completed, we can call the print_stats()
function to print profiling statistics.
prof.print_stats()
The Profiler
object also has a method named dump_stats()
which lets us dump profiling results to a file.
prof.dump_stats("main_func.prof")
From python 3.8, we can even use the Profile()
object as a context manager. Below we have recreated code of main_func()
but this time we have declared Profile()
as context manager using with
statement. We have then returned the profile object as well which can be used to print the results of profiling.
def main_func():
arr = []
with cProfile.Profile() as prof:
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr, prof
out, prof = main_func()
print(out)
prof.print_stats()
Output:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
26 function calls in 2.504 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.000 0.000 2.504 0.501 <ipython-input-178-2cfef41b561a>:3(slow_add)
5 0.000 0.000 0.000 0.000 <ipython-input-178-2cfef41b561a>:7(fast_add)
5 2.503 0.501 2.503 0.501 {built-in method time.sleep}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The python also lets us execute profiler from the command line directly on the python file. We have put the code of this example into a file named addition.py
as shown below.
addition.py
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
def main_func():
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr
if __name__ == '__main__':
main_func()
Once we have put code to a python file, we can use the below command in the command prompt or Unix shell to profile that python file.
Command to Profile File:
python -m cProfile addition.py
Output
29 function calls in 2.504 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 2.504 2.504 addition.py:1(<module>)
1 0.000 0.000 2.504 2.504 addition.py:10(main_func)
5 0.000 0.000 2.504 0.501 addition.py:3(slow_add)
5 0.000 0.000 0.000 0.000 addition.py:7(fast_add)
1 0.000 0.000 2.504 2.504 {built-in method builtins.exec}
5 2.504 0.501 2.504 0.501 {built-in method time.sleep}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
We can even save the output of profiling to file when performing profiling from the command line as explained below.
python -m cProfile -o prof_out.prof addition.py
We can also sort the output of profiling based on a particular column of the table by giving that column name with -s
parameter.
pstats
¶The python also provides the pstats
module which lets us modify/filter results of profiling. It lets us load the profiler results file using it's Stats()
constructor. We can pass the Stats()
constructor a single file or the list of files containing profiling results.
Below we have loaded the prof_out.prof
file created during the previous example. We can print profiling results from the Stats
object by calling the print_stats()
method on it. The print_stats()
method accepts a list of arguments that can be used to filter results. It has been explained below with simple examples.
import pstats
p = pstats.Stats("prof_out.prof")
p.print_stats();
The pstats
module also provides us with various methods that let us modify the output of profiling. One important method that it provides it sort_stats()
which accepts column name from profiling results and then sorts data according to the column name. The list of column names is stored in pstats.SortKey
.
Below we have sorted the results of profiling based on cumulative time and then have taken the first 3 results from it when printing.
p.sort_stats(pstats.SortKey.CUMULATIVE).print_stats(3);
Below we have called sort_stats()
again but this time we are sorting profiling results based on total time and cumulative time.
p.sort_stats(pstats.SortKey.TIME, pstats.SortKey.CUMULATIVE).print_stats();
The below example demonstrates that we can pass a string to print_stats()
and it'll filter profiling results which have that string present in it.
p = pstats.Stats("prof_out.prof")
p.print_stats("sleep");
The below example demonstrates another usage of print_stats()
where we have passed float number to filter a total number of results to half. It'll multiply a total number of rows by that float number and print only that many rows.
p = pstats.Stats("prof_out.prof")
p.print_stats(0.5);
Another important method provided by Stats
object is strip_dirs()
. It'll remove all leading pathname information from filename
in the last column of profiling results. This can help us if the result has long path information present which is truncating the output to the next line exceeding monitor limit.
p = pstats.Stats("prof_out.prof")
p.strip_dirs().print_stats();
We can even reverse the order of profiling results by calling the reverse_order()
method on Stats
object.
p = pstats.Stats("prof_out.prof")
p.sort_stats(pstats.SortKey.CUMULATIVE).reverse_order().print_stats();
The Stats
object has two other methods which can be very useful to understand which function is calling which other function. The print_callers()
provides us results that specify which function was called by which other function. The print_callees()
function results which informs us about a function called which other functions.
p = pstats.Stats("prof_out.prof")
p.print_callers();
p.print_callees();
Our fifth example explains the profiling of recursive function. We have written a recursive function below which returns nth Fibonacci number. We have profiled this function by calling it with argument 25. We can see from the profiled output that there were 150052 total function calls and 4 primitive calls. We can even notice from ncalls
of the first line that Fibonacci
function has 150049 total calls and 1 primitive call.
def Fibonacci(n):
if n<=0:
print("Incorrect input")
elif n==1:
return 0
elif n==2:
return 1
else:
return Fibonacci(n-1)+Fibonacci(n-2)
cProfile.run("Fibonacci(25)")
profile
¶The second profiler that we'll explain as a part of this tutorial is profile
. We'll import it to get started.
import profile
run()
¶The profile
also has the same methods as that of cProfile
. Our first example of profile
is exactly the same as that of cProfile
. We have used the run()
method of profile
to profile python statement which sums up 10000 random integers between 0-9.
import random
profile.run("sum([random.randint(1,10) for i in range(10000)])")
The output of profile
is formatted exactly the same way as that of the output of cProfile
.
Profile()
Constructor¶Our second example explains how to use the Profile()
constructor to profile code. We can call its print_stats()
method once profiling is complete to see results. We can even dump the output of profiling to a file using the dump_stats()
method.
import time
def slow_add(a, b):
time.sleep(0.5)
return a+b
def fast_add(a, b):
return a+b
def main_func():
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
return arr
prof = profile.Profile()
prof.run("main_func()")
prof.print_stats()
prof.dump_stats("prof_out2.prof")
Calibrated
Profiling¶The profiler available through the profile
module subtracts a small constant from each event during profiling hence removing extra overhead added to the profiling process by the profiler. The default value of this constant is 0. The process of removing this overhead of profiling is commonly referred to as calibration.
We can find out a better constant for a particular system by executing the below code. Below code executes python statements 10000 statements with a profiler and without a profiler. It then finds our overhead as a float. It performs this process five times and adds overhead found each time to a python list. We then take an average to find out final float which can be used as a constant that will be used by the profiler to remove extra overhead added by the profiler.
pr = profile.Profile()
cal = []
for i in range(5):
cal.append(pr.calibrate(10000, verbose=1))
bias = sum(cal)/5
bias
Once we have found out bias constant that we would like to remove for each event when using profiling, we can set it in profiler by setting the bias
parameter when creating the Profile
object as explained below.
We can then again run code with this calibrated profiler to see how it affects the measurements.
prof = profile.Profile(bias=bias)
prof.run("main_func()")
prof.print_stats()
We can see from the above results that time taken when using a calibrated profiler is a little less than when used without calibrating.
The jupyter notebook provides us with line and cell magic commands which can be used for profiling in notebooks. We can use %prun
to perform profiling of one line code and %%prun
to perform profiling of whole cell code.
Below we are first using %prun
to profile function call Fibonacci(25)
.
%prun Fibonacci(25)
150052 function calls (4 primitive calls) in 0.035 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
150049/1 0.035 0.000 0.035 0.035 <ipython-input-114-b552bd3bf56e>:1(Fibonacci)
1 0.000 0.000 0.035 0.035 {built-in method builtins.exec}
1 0.000 0.000 0.035 0.035 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Below we are profiling code from our previous main_func()
function using %%prun
cell magic command.
%%prun
arr = []
for i in range(10):
if i%2==0:
arr.append(slow_add(i,i))
else:
arr.append(fast_add(i,i))
print(arr)
63 function calls in 2.504 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 2.504 0.501 2.504 0.501 {built-in method time.sleep}
3 0.001 0.000 0.001 0.000 socket.py:337(send)
1 0.000 0.000 2.504 2.504 <string>:1(<module>)
1 0.000 0.000 2.504 2.504 {built-in method builtins.exec}
5 0.000 0.000 2.504 0.501 <ipython-input-28-f6d5027c3479>:3(slow_add)
3 0.000 0.000 0.001 0.000 iostream.py:197(schedule)
2 0.000 0.000 0.001 0.000 iostream.py:386(write)
1 0.000 0.000 0.001 0.001 {built-in method builtins.print}
3 0.000 0.000 0.000 0.000 threading.py:1080(is_alive)
3 0.000 0.000 0.000 0.000 threading.py:1038(_wait_for_tstate_lock)
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
3 0.000 0.000 0.000 0.000 {method 'acquire' of '_thread.lock' objects}
2 0.000 0.000 0.000 0.000 {built-in method posix.getpid}
2 0.000 0.000 0.000 0.000 iostream.py:310(_is_master_process)
5 0.000 0.000 0.000 0.000 <ipython-input-28-f6d5027c3479>:7(fast_add)
2 0.000 0.000 0.000 0.000 iostream.py:323(_schedule_flush)
3 0.000 0.000 0.000 0.000 iostream.py:93(_event_pipe)
3 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects}
2 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
3 0.000 0.000 0.000 0.000 threading.py:507(is_set)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
The profile
profiler can be used from the command line/shell exactly the same way as that of cProfile
. Below we are first profiling the addition.py
file created in earlier examples.
Command to Profile File:
python -m profile addition.py
Output
30 function calls in 0.003 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 0.000 0.000 :0(append)
1 0.000 0.000 0.001 0.001 :0(exec)
1 0.003 0.003 0.003 0.003 :0(setprofile)
5 0.000 0.000 0.000 0.000 :0(sleep)
1 0.000 0.000 0.000 0.000 addition.py:1(<module>)
1 0.000 0.000 0.000 0.000 addition.py:10(main_func)
5 0.000 0.000 0.000 0.000 addition.py:3(slow_add)
5 0.000 0.000 0.000 0.000 addition.py:7(fast_add)
1 0.000 0.000 0.003 0.003 profile:0(<code object <module> at 0x7f17e69a2270, file "addition.py", line 1>)
0 0.000 0.000 profile:0(profiler)
We can even save the output of profiling to file when performing profiling from the command line as explained below.
python -m profile -o prof_out3.prof addition.py
This ends our small tutorial explaining how to profile Python code using cProfile
and profile
profilers. Please feel free to let us know your views in the comments section.