Pyinstrument is a statistical python profiler which records call stack every 1 ms rather than recording the whole trace. This is done in order to avoid profiling overhead which can increase a lot if some functions are getting called many times and not taking that much time to complete. This kind of behavior can even distort results by highlighting part of the code/function which might not be slow but getting called many times completing faster. In a way it kind of tries to avoid profiling noise by removing profiling information of faster parts of code. This kind of behavior also has a drawback that some of the function calls which ran quite fast will not be recorded but they are already fast. Pyinstrument uses an OS feature called signals
to ask OS to send a signal and handle signals using a python signal handler (PyEval_SetProfile
) for recording every 1 ms. Pyinstrument has other limitations due to its design that it can't profile the multi-threaded code. It'll only be recording the main thread. As Pyinstrument only takes recordings every 1 ms seconds, it runs faster than python profilers like cProfile, profile, etc. Apart from this, Pyinstrument records wall time (not CPU time) which includes all database call wait, network packet wait, file i/o wait, etc. The main reason for measuring wall-clock time given by the author of Pyinstrument is that Python is most commonly used as glue language connecting different services designed in different languages and users are interested in knowing whether slowness is due to Python or some other services. Pyinstrument also gives us a trace of only functions used in our script rather than flooding output with all internal function calls of python and other C function calls. Pyinstrument also lets developer format profiling output in different formats like HTML, JSON, text, or user defined class for rendering output.
As a part of this tutorial, we'll explain with simple examples of how we can use Pyinstrument to profile our python code. We'll explain the usage of profiler in a two way:
Profiler
instance in Python code itself.As a part of this section, we'll explain how we can profile script from the command line using various options available from Pyinstrument. We'll try to cover all possible options available through various examples.
Our first example involves profiling results generation for a simple python script using Pyinstrument. We have created a simple python script which loops 1000000 times generating two random numbers, adding them to generate a sum, and then adding that sum to the list before returning the list. We have kept this script in a file named pyinstrument_ex1.py
in a folder named profiling_examples
. We'll be reusing the script in the future examples as well.
CODE
import random
def add(a, b):
return a+b
def get_sum_of_list():
final_list = []
for i in range(1000000):
rand1 = random.randint(1,100)
rand2 = random.randint(1,100)
out = add(rand1, rand2)
final_list.append(out)
return final_list
if __name__ == "__main__":
l = get_sum_of_list()
# print(l)
Below we have profiled script using Pyinstrument without default options. Please make a note that we are using the !
symbol which lets us execute a shell command from Jupyter Notebook. This command can be run exactly the same way from a shell.
!pyinstrument profiling_examples/pyinstrument_ex1.py
The output of profiling generated by Pyinstrument is formatted in a tree-like structure. It'll be showing total time at the top and then the time taken by sub-functions will be shown one level down the tree. It also has color-coded time taken by various functions. The [self]
in output refers to the time taken by function excluding other functions called in that function. FOr example, if we have taken randrange
then two nodes below it are [self]
and _randbelow
where [self]
refers to the time taken by randrange
excluding time taken by _randbelow
. The same goes for get_sum_of_list
function which has four children (randint(), [self], add(), list.append())
. It might happen that some of the functions of the script do not get recorded by Pyinstrument because it can be too fast and Pyinstrument takes sampling every 0.001 seconds.
Pyinstrument also stores profiling results of the last execution and gives us ID which we can use to reprint the results as well as formating that results in a different formats. Below we have explained how we can reload previous profiling results using the --load-prev
option.
!pyinstrument --load-prev 2020-11-12T16-39-21
We'll be explaining how we can save the output of profiling to a file as a part of our third example. We have designed a new python script that has three functions which generate an average of 100000 random numbers between 1-100 each taking a different time. We have introduced artificial time in each function so that it looks like each time different time. We are then calling each function and printing their results. We have stored code of script in a file named pyinstrument_ex2.py
kept in a folder named profiling_examples
. We'll be using the same script in our future examples as well.
CODE
import time
import random
def very_slow_random_generator():
time.sleep(5)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def slow_random_generator():
time.sleep(2)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def fast_random_generator():
time.sleep(1)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def main_func():
result = fast_random_generator()
print(result)
result = slow_random_generator()
print(result)
result = very_slow_random_generator()
print(result)
main_func()
Pyinstrument provides us with an option named -o
or --outfile
which can let us save profiling output to file.
-o FILE_NAME
--output==FILE_NAME
Below we have stored profiling results to a file named pyinstrument_ex2_out.out
. We are then also displaying file contents.
!pyinstrument -o pyinstrument_ex2_out.out profiling_examples/pyinstrument_ex2.py
!cat pyinstrument_ex2_out.out
In our fourth example, we'll explain how we can format output in an HTML and JSON format. Pyinstrument lets us format our profiling results using the -r
or --renderer
option.
-r RENDERED
--rendered=RENDERER
Below we have formatted output in an HTML format. It'll store results in a temporary file and open it in a new tab in a web browser.
!pyinstrument -r html profiling_examples/pyinstrument_ex2.py
Below we have stored HTML output in a file.
!pyinstrument -r html -o pyinstrument_ex1.html profiling_examples/pyinstrument_ex2.py
Below we have formatted output in a JSON format for our first script from the first example. We have then included code that loads content from file and prints it as well.
!pyinstrument -r json -o pyinstrument_ex1.json profiling_examples/pyinstrument_ex1.py
import json
trace = json.load(open("pyinstrument_ex1.json"))
print(json.dumps(trace, indent=2))
Pyinstrument also provides an option named --no-color
which lets us instruct Pyinstrument to not use color coding when printing profiling results.
!pyinstrument --no-color profiling_examples/pyinstrument_ex2.py
Pyinstrument provides us with an option named -t
or --timeline
which will order profiling results in an order in which they are executed. If we don't use these options then profiling results will be ordered by the time taken by functions. We can notice from the output of this execution that it first shows fast_random_generator
then slow_random_generator
and at last very_slow_random_generator
whereas this order was reverse in all previous profiling results.
!pyinstrument -t profiling_examples/pyinstrument_ex2.py
We can inform Pyinstrument to show us profiling results of all default libraries used in a code by using the --show-all
option. Below we have first compiled code normally and then used the --show-all
option in the next cell to show default library cells.
!pyinstrument -t profiling_examples/pyinstrument_ex2.py
!pyinstrument --load-prev 2020-11-12T17-05-28 --show-all
Pyinstrument lets us hide files for which we don't want to include profiling results by using --hide=EXPR
and --hide-regex=REGEX
options.
--hide=EXPR
- Here EXPR is a glob path matching expression commonly used in UNIX shell.--hide-regex=REGEX
- Here REGEX is a regular expression which will match paths.Below we have first normally profiled our pyinstrument_ex2.py
script. We have then removed the hidden frame from profiling output using options.
!pyinstrument profiling_examples/pyinstrument_ex2.py
!pyinstrument --load-prev 2020-11-12T17-12-13 --hide="*/random/*"
!pyinstrument --load-prev 2020-11-12T17-12-13 --hide-regex="random"
It sometimes happens that --hide
or --hide-regex
is removing all files matching that pattern and we want to include some of the files then we can override it using the below-mentioned option. These options will again show that file results.
--show=EXPR
--show-regex=REGEX
Pyinstrument can be used to profile a python module as well using the -m
option. Below we have a profiled random module.
!pyinstrument -m random
Profiler
instance in Python code itself.¶As a part of this section, we'll explain with a few examples of how we can use the Profile
instance to monitor a particular part of the code if we don't want to monitor the total code.
Profile
Instance¶Our first example demonstrates how we can profile a python code using the Profile
instance. We first need to create an instance of Profile
. We then need to cover part of the code which we want to profile between start()
and stop()
methods of profile instance. Below we have again produced the same code that we had used in the previous section. We have covered part where we are calling get_sum_of_list()
method between start()
and stop()
methods.
import pyinstrument
import random
profiler = pyinstrument.Profiler(interval=0.01) ## Profiler
def add(a, b):
return a+b
def get_sum_of_list():
final_list = []
for i in range(1000000):
rand1 = random.randint(1,100)
rand2 = random.randint(1,100)
out = add(rand1, rand2)
final_list.append(out)
return final_list
profiler.start() ## Start Profiling
l = get_sum_of_list()
profiler.stop(); ## Stop Profiling
output_text()
¶The profiler instance has a method named output_text()
which can let us print results in a text format. It has options to show Unicode text, colored text, show all profiled frames, and time lined output. We have first printed uncolored text output followed by colored text output.
print(profiler.output_text(unicode=False, color=False, show_all=False, timeline=False))
print(profiler.output_text(color=True))
We'll be explaining how we can format output in HTML and JSON format as a part of this example. We have reproduced code that we had used in the previous section. We have also created a profiler instance and then covered code where we are calling the main function between start()
and stop()
methods for profiling.
import pyinstrument
import time
import random
def very_slow_random_generator():
time.sleep(5)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def slow_random_generator():
time.sleep(2)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def fast_random_generator():
time.sleep(1)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def main_func():
result = fast_random_generator()
print(result)
result = slow_random_generator()
print(result)
result = very_slow_random_generator()
print(result)
profiler = pyinstrument.Profiler(interval=0.01) ## Profiler
profiler.start()
main_func()
profiler.stop();
output_html()
¶The profiler instance has a method named output_html()
which will generate output as HTML string. We have stored output to a file named pyinstrument_ex2_2.html
file.
output = profiler.output_html()
fp = open("pyinstrument_ex2_2.html", "w")
fp.write(output)
fp.close();
open_in_browser()
¶The open_in_browser()
method saves HTML output to a temporary file and then opens it in a new tab in the web browser.
profiler.open_in_browser()
output()
¶The output()
method lets us format output in an HTML or JSON format by using a different renderers. Below we have generated JSON formatted output and saved it to a file. We have then reloaded JSON formatted output and printed it to standard output.
from pyinstrument.renderers import JSONRenderer, HTMLRenderer
json_output = profiler.output(JSONRenderer(show_all=False, timeline=False))
fp = open("pyinstrument_ex2.json", "w")
fp.write(json_output)
fp.close();
import json
trace = json.load(open("pyinstrument_ex2.json"))
print(json.dumps(trace, indent=2))
As a part of our third example, we'll simply demonstrate how we can order profiling results in a timelined manner to show the order in which they were executed. We have rewritten the code below from the previous example again.
import pyinstrument
import time
import random
def very_slow_random_generator():
time.sleep(5)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def slow_random_generator():
time.sleep(2)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def fast_random_generator():
time.sleep(1)
arr = [random.randint(1,100) for i in range(100000)]
return sum(arr) / len(arr)
def main_func():
result = fast_random_generator()
print(result)
result = slow_random_generator()
print(result)
result = very_slow_random_generator()
print(result)
profiler = pyinstrument.Profiler(interval=0.01) ## Profiler
profiler.start()
main_func()
profiler.stop();
Below we have printed results in a text format using the timeline
parameter as True. We have also explained the show_all
parameter by setting it to True and then False later on.
print(profiler.output_text(color=True, show_all=False, timeline=True))
print(profiler.output_text(color=True, show_all=True, timeline=True))
This ends our simple and small tutorial explaining how to use pyinstrument to profile Python code. Please feel free to let us know your views in the comments section.