Updated On : Nov-20,2020 Tags profiler, py-spy, statistical-profiler
py-spy - Sampling Profiler for Python Code

py-spy - Sampling Profiler for Python Code

Python has a bunch of good code profiling libraries like cProfile, pprofile, line_profiler, profile, etc. All of this profiler requires us to modify our Python code in some way to profile it and even adds some profiling overhead as they run in the same thread as the main program. The py-spy is a python profiler that performs profiling by sampling running python code and it runs in a separate process than our original code. This results in not impacting the performance of our original python script. The py-spy is designed in Rust and is a very good solution for profiling running python code. It can be very useful for the process running in the production as it'll let us profile it without adding any line of code as well as it'll run in a process of its own hence won't add overhead.

The py-spy finds out call stack of python code by looking at global PyInterpreterState to get all threads running in the python interpreter and then iterates over PyFrameObjects in each thread to get total call trace. It pauses the program for a small amount of time each time sampling for the profiling results. Below we have given some information on some useful options available with py-spy.

  • The py-spy also lets us profile python extensions written in other languages like C, C++, Cython, etc. It lets us do that using the --native option when profiling.
  • The py-spy works well with multi-threading code as well. It also lets us profile sub-processes started by our original python program by using --subprocess flag when profiling.
  • The py-spy can also detect Global Interpret Lock (GIL) activity while profiling. It can let us include traces of threads that are holding GIL by using --gil option when profiling.
  • The py-spy provides an option named --nonblocking which lets us prevent it from pausing our program for sampling to let it run without interruption. It'll then try to get profiling data from the running python process.
  • As py-spy is a sampling-based profiler, it'll only include traces for running threads and exclude traces of threads that are sleeping/idle. It tries to get information about idle threads from the underlying OS but has some limitations. The Py-spy provides us with options named --idle which will force it to include trace for threads which it considers idle.

Please make a note that py-spy might need to be run with root/administrator access on systems that don't let it access processes running the system for profiling.

The py-spy has three main commands each of which has few options for showing profiling results in a different way.

  • record - It'll record profiling output and save it to a file in the format specified.
  • dump - It'll dump profiling results to standard output.
  • top - It'll display profiling results as a table of functions consuming CPU.

This ends our small introduction of py-spy. We'll now explain the usage of py-spy with few examples.

1. record

In this section, we'll explain with a different example of how we can use the record command of py-spy to record the profiling of a script or process.

Below we have given a general command format that we can use with the record option of py-spy.

  • py-spy record [OPTIONS] --output <filename> --pid <pid> [python_program] <arg1> <arg2> ...

Example 1: Simple Sampling

Our first example simply explains how we can record profiling results using py-spy. We'll be profiling the below script using py-spy. The script loops 1000000 times generating two random numbers, adding them, and then adding sum to a list before returning a list. We have saved a code to a file named example_2.py in the profiling_examples folder.

CODE

import random

def add(a, b):
    return a+b


def get_sum_of_list():
    final_list = []
    for i in range(1000000):
        rand1 = random.randint(1,100)
        rand2 = random.randint(1,100)
        out = add(rand1, rand2)
        final_list.append(out)
    return final_list


if __name__ == "__main__":
    l = get_sum_of_list()
#    print(l)

Below we have profiled the script using py-spy record options. The py-spy record option expects us to give the file name to which the output of profiling will be stored. Please make a note that by default py-spy record command generates a flame graph of profiling results. Few of our examples start with the ! symbol because we have executed scripts from jupyter notebook which lets us execute the shell command from the notebook.

In [35]:
!py-spy record --output py_spy_1.svg python profiling_examples/example_2.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_1.svg'. Samples: 167 Errors: 0

py-spy - Sampling Profiler for Python Code

Example 2: Sampling Process using PID

As a part of our second example, we'll explain how we can use py-spy to profile running python program by tracking its process id. We have written a script that has three functions that generate 100000 in the range 1-100 and then take an average of it before returning. All functions take different time for it because we have artificially introduced time so that it takes longer. We'll be monitoring the process using py-spy which runs this code. We have stored code to a file named a_random_generator.py in the profiling_examples folder.

CODE

import time
import random

def very_slow_random_generator():
    time.sleep(15)
    arr = [random.randint(1,100) for i in range(100000)]
    return sum(arr) / len(arr)

def slow_random_generator():
    time.sleep(10)
    arr = [random.randint(1,100) for i in range(100000)]
    return sum(arr) / len(arr)

def fast_random_generator():
    time.sleep(5)
    arr = [random.randint(1,100) for i in range(100000)]
    return sum(arr) / len(arr)

def main_func():
    result = fast_random_generator()
    print(result)

    result = slow_random_generator()
    print(result)

    result = very_slow_random_generator()
    print(result)

main_func()

Execution Steps

The process of executing step and generating profiling results by monitoring process will require 3 shells on Linux.

Shell 1 Command:

In the first shell, we'll execute the python script as mentioned below.

  • python profiling_examples/a_random_generator.py

OUTPUT:

50.46953
50.55925
50.58661

Shell 2 Command:

In the second shell, we'll execute the below command to get the process id of the script which we'll use for profiling.

  • ps aux | grep a_random_generator

OUTPUT:

sunny    11077  0.5  0.3  29256 11892 pts/1    S+   11:00   0:00 python profiling_examples/a_random_generator.py
sunny    11079  0.0  0.0  14428  1008 pts/3    S+   11:00   0:00 grep --color=auto a_random_generator

Shell 3 Command:

In the third shell, we'll execute the below-mentioned command to generate the profiling result of the process. The py-spy provides us with option --pid <process_id> or -p <process_id> which can be used to monitor process.

  • sudo env "PATH=$PATH" py-spy record --output py_spy_2.svg --pid 11077

OUTPUT:

py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_2.svg'. Samples: 2248 Errors: 0

Please make a note that the output file will also be created with root access.

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 3: Formatting Output in Different Format

The py-spy lets us format output generated in three formats as mentioned below using -f <format> or --format <format> option.

  • flamegraph
  • speedoscope
  • raw text

We can format output to be raw text using --format raw option and it'll store the output in a file in raw text format. We have below generated raw text output for our code from example 1.

In [43]:
!py-spy record --output py_spy_3.out --format raw python profiling_examples/example_2.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote raw flamegraph data to 'py_spy_3.out'. Samples: 162 Errors: 0
py-spy> You can use the flamegraph.pl script from https://github.com/brendangregg/flamegraph to generate a SVG

OUTPUT:

<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:233) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:193) 3
<module> (example_2.py:18);get_sum_of_list (example_2.py:12) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:196) 4
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:187) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:12);add (example_2.py:3) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:236) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:235) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:237) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:234) 5
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:236) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:197) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:234) 4
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:228) 5
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:233) 5
<module> (example_2.py:18);get_sum_of_list (example_2.py:9) 3
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:229) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11) 8
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:194) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:184) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198) 10
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:187) 4
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:237) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:232) 6
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222) 11
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:235) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:185) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:193) 5
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:185) 1
 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:13) 7
_find_and_load (<frozen importlib._bootstrap>:983);_find_and_load_unlocked (<frozen importlib._bootstrap>:967);_load_unlocked (<frozen importlib._bootstrap>:677);exec_module (<frozen importlib._bootstrap_external>:728);_call_with_frames_removed (<frozen importlib._bootstrap>:219);<module> (site.py:586);main (site.py:573);addsitepackages (site.py:349);addsitedir (site.py:207);addpackage (site.py:168);<module> (<string>:1);find_spec (<frozen importlib._bootstrap_external>:1280);_get_spec (<frozen importlib._bootstrap_external>:1252);find_spec (<frozen importlib._bootstrap_external>:1368);_fill_cache (<frozen importlib._bootstrap_external>:1416) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:229) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:228) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10) 7
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:174) 1
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:197) 3
<module> (example_2.py:18);get_sum_of_list (example_2.py:12);add (example_2.py:4) 2
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222);randrange (random.py:198) 7
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:198);_randbelow (random.py:232) 4
<module> (example_2.py:18);get_sum_of_list (example_2.py:10);randint (random.py:222) 12
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:184) 9
<module> (example_2.py:18);get_sum_of_list (example_2.py:11);randint (random.py:222);randrange (random.py:194) 4

Below we have generated output in a format speed score for our random numbers generator script from the previous example. The output will be stored in a file and when we upload the file to a https://www.speedscope.app/, it'll generate speed scope there which we can analyze for performance.

In [42]:
!py-spy record --output py_spy_4.out --format speedscope python profiling_examples/a_random_generator.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.

50.54668
50.62407
50.51211

py-spy> Stopped sampling because process exitted
py-spy> Wrote speedscope file to 'py_spy_4.out'. Samples: 3089 Errors: 0
py-spy> Visit https://www.speedscope.app/ to view
Error: No child process (os error 10)

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 4: Change Sampling Rate

The default sampling rate of py-spy is 100 samples per second. The py-spy provides us with options -r <rate> or --rate <rate> which can let us change the default sampling rate. We have executed our script from example 1 again but this time with a sampling rate of 1000. We can increase this number to force py-spy to sample fewer frames if we want to only report functions that are really taking a lot of time (in seconds maybe).

In [45]:
!py-spy record --output py_spy_5.svg --rate 1000 python profiling_examples/example_2.py
py-spy> Sampling process 1000 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_5.svg'. Samples: 1616 Errors: 0
Error: No child process (os error 10)

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 5: Aggregate Profiling Results By Function

The output generated by a py-spy is according to the sampling it performed which means that if a function is running longer then it'll have more than one entry in the profiling results. We can merge all these entries into one entry by using -F or --function option which will aggregate all entries of the function.

Below we have profiled our code from example 1 again but this time we have aggregated function calls.

In [46]:
!py-spy record --output py_spy_6.svg --function python profiling_examples/example_2.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_6.svg'. Samples: 158 Errors: 0

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 6: Multi-Threading Example

As a part of our sixth example, we'll explain how we can inform py-spy to keep information about threads in multi-threading example. We have written a simple script that creates a thread pool of 3 threads. Each thread calls a function that generates a random number in the range 1-100 and returns it taking the time of 2 seconds. We have submitted these 3 threads to the thread pool and retrieving results once they complete. We have kept the code written below to the file named multi_threaded_example.py in the profiling_examples folder.

The py-spy provides us with an option named --threads or -t which instructs thread id to be added in the profiling results.

The below code expects that developer has knowledge of the python module named concurrent.futures. Please feel free to check our tutorial on the same if you are interested in learning it.

CODE

import time
import random
from concurrent.futures import ThreadPoolExecutor


def slow_random_gen():
    time.sleep(2)
    val = random.randint(1,100)
    return val

def main_func():
    ppe = ThreadPoolExecutor(max_workers=3)
    futures = []

    futures.append(ppe.submit(slow_random_gen))
    futures.append(ppe.submit(slow_random_gen))
    futures.append(ppe.submit(slow_random_gen))

    print([future.result() for future in futures])

if __name__ == '__main__':
    main_func()

Below we have generated a flame graph of the script.

In [50]:
!py-spy record --output py_spy_7.svg --threads python profiling_examples/multi_threaded_example.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.

[61, 96, 10]

py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_7.svg'. Samples: 243 Errors: 0

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 7: Multi-Processing Example

The py-spy also provides us with the option to monitor subprocesses created by our main process using --subprocesses or -s option. Below we have written a simple script which creates three sub-processes from our original process using the process pool of concurrent.futures module. Each process calls a random numbers generator function which generates an array of size 1000 x 1000 of random numbers in the range 1-100 and then returns the average of the array taking a different time. We have kept the code in a file named multi_processing_example.py in the folder profiling_examples. This code also expects background knowledge of concurrent.futures module.

CODE

import time
import numpy as np
from concurrent.futures import ProcessPoolExecutor

def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def fast_random_generator():
    time.sleep(1)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def main_func():
    ppe = ProcessPoolExecutor(max_workers=3)
    futures = []

    futures.append(ppe.submit(fast_random_generator))
    futures.append(ppe.submit(slow_random_generator))
    futures.append(ppe.submit(very_slow_random_generator))

    print([future.result() for future in futures])

if __name__ == '__main__':
    main_func()

Below we have profiled the script using --subprocesses option.

In [51]:
!py-spy record --output py_spy_8.svg --subprocesses python profiling_examples/multi_processing_example.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.

[50.047314, 50.047314, 50.047314]

py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_8.svg'. Samples: 503 Errors: 0
Error: No child process (os error 10)

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 8: Tracking Idle Threads

The py-spy does not monitor idle threads by default but we can force it to do so using --idle or -i option. Below we have executed our script from a multi-threading example with --idle option.

  • sudo env "PATH=$PATH" py-spy record --output py_spy_9.svg --idle python profiling_examples/multi_threaded_example.py

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Example 9: Profiling Without Blocking Running Program

The py-spy generally profiles results by pausing the running program for every sampling which can add little overhead. We can force py-spy to not block the running program and profile results using --nonblocking option. Please make a note that results might not be very accurate with this option.

In [55]:
!py-spy record --output py_spy_10.svg --nonblocking python profiling_examples/example_2.py
py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because process exitted
py-spy> Wrote flamegraph data to 'py_spy_10.svg'. Samples: 172 Errors: 0

PROFILING OUTPUT:

py-spy - Sampling Profiler for Python Code

Other Useful Options

Below we have given a few other useful options available with record command of py-spy.

  • -d <duration> or --duration <duration> - This options lets us specify duration for which we want to profile results in seconds. If we don't give this option then it'll keep profiling the program as long as it’s running. We can force py-spy to profile results for few seconds using this option.
  • -n or --native - This option lets py-spy collect traces from the Python module written in C, C++, or Cython.
  • -g or --gil - This option forces py-spy to trace only threads that are holding GIL.

record Command Help Doc

We can access the doc of py-spy record using the below command.

In [27]:
!py-spy record --help
py-spy-record
Records stack trace information to a flamegraph, speedscope or raw file

USAGE:
    py-spy record [OPTIONS] --output <filename> --pid <pid> [python_program]...

OPTIONS:
    -p, --pid <pid>              PID of a running python program to spy on
    -o, --output <filename>      Output filename
    -f, --format <format>
            Output file format [default: flamegraph]  [possible values:
            flamegraph, raw, speedscope]
    -d, --duration <duration>
            The number of seconds to sample for [default: unlimited]

    -r, --rate <rate>
            The number of samples to collect per second [default: 100]

    -s, --subprocesses           Profile subprocesses of the original process
    -F, --function
            Aggregate samples by function name instead of by line number

    -g, --gil
            Only include traces that are holding on to the GIL

    -t, --threads                Show thread ids in the output
    -i, --idle                   Include stack traces for idle threads
    -n, --native
            Collect stack traces from native extensions written in Cython, C or
            C++
        --nonblocking
            Don't pause the python process when collecting samples. Setting this
            option will reduce the perfomance impact of sampling, but may lead
            to inaccurate results
    -h, --help                   Prints help information
    -V, --version                Prints version information

ARGS:
    <python_program>...    commandline of a python program to run

2. dump

The dump command lets us profile one frame of running a python program and prints results to standard output. Below is the common format of using py-spy with dump command.

  • py-spy dump [OPTIONS] --pid <pid>

Example 1: Simply Dumping Profiling Results

In this example, we have explained simply how we can profile running python programs using the dump command of py-spy. We have used three different shells for this like our earlier example on the profiling process.

Shell 1 Command:

In this shell, we'll run our random generator python program from earlier.

  • python profiling_examples/a_random_generator.py

OUTPUT:

50.62938
50.40982
50.38321

Shell 2 Command:

In the second shell, we'll try to get the process id of running the python program.

  • ps aux | grep a_random_generator

OUTPUT:

sunny    13924  0.5  0.3  29256 11952 pts/1    S+   12:55   0:00 python profiling_examples/a_random_generator.py
sunny    13926  0.0  0.0  14428  1092 pts/3    S+   12:55   0:00 grep --color=auto a_random_generator

Shell 3 Command:

In the third shell, we'll profile the running python programs using process id. Please make a note that we have used sudo as in Linux, root access is required to access process data.

  • sudo env "PATH=$PATH" py-spy dump --pid 13924

OUTPUT:

Process 13924: python profiling_examples/a_random_generator.py
Python v3.7.3 (/home/sunny/anaconda3/bin/python3.7)

Thread 13924 (idle)
    very_slow_random_generator (a_random_generator.py:5)
    main_func (a_random_generator.py:26)
    <module> (a_random_generator.py:29)

Example 2: Formatting Output as JSON

Our second example is exactly the same as our previous example with the only difference that we have generated output in a JSON format using --json option.

Shell 1 Command:

  • python profiling_examples/a_random_generator.py

OUTPUT:

50.62938
50.40982
50.38321

Shell 2 Command:

  • ps aux | grep a_random_generator

OUTPUT:

sunny    13999  1.0  0.3  29256 11892 pts/1    S+   12:59   0:00 python profiling_examples/a_random_generator.py
sunny    14001  0.0  0.0  14428  1008 pts/3    S+   12:59   0:00 grep --color=auto a_random_generator

Shell 3 Command:

  • sudo env "PATH=$PATH" py-spy dump --json --pid 13999

OUTPUT:

[
  {
    "pid": 13999,
    "thread_id": 140435551061824,
    "os_thread_id": 13999,
    "active": false,
    "owns_gil": false,
    "frames": [
      {
        "name": "very_slow_random_generator",
        "filename": "profiling_examples/a_random_generator.py",
        "module": null,
        "short_filename": "a_random_generator.py",
        "line": 5,
        "locals": null
      },
      {
        "name": "main_func",
        "filename": "profiling_examples/a_random_generator.py",
        "module": null,
        "short_filename": "a_random_generator.py",
        "line": 26,
        "locals": null
      },
      {
        "name": "<module>",
        "filename": "profiling_examples/a_random_generator.py",
        "module": null,
        "short_filename": "a_random_generator.py",
        "line": 29,
        "locals": null
      }
    ]
  }
]

Other Useful Options

Below are other useful options available with dump command of py-spy.

  • -l or --locals - It shows local variables for each frame in the profiling results.
  • -n or --native - It lets us profile native extensions written in C, C++, or Cython.
  • --nonblocking - It let us profile python program without interrupting running program as explained in record command example (Example 9).

dump Command Help Doc

We can access helper documents as well with the below command.

In [28]:
!py-spy dump --help
py-spy-dump
Dumps stack traces for a target program to stdout

USAGE:
    py-spy dump [OPTIONS] --pid <pid>

OPTIONS:
    -p, --pid <pid>      PID of a running python program to spy on
    -l, --locals         Show local variables for each frame
    -j, --json           Format output as JSON
    -n, --native         Collect stack traces from native extensions written in
                         Cython, C or C++
        --nonblocking    Don't pause the python process when collecting samples.
                         Setting this option will reduce the perfomance impact
                         of sampling, but may lead to inaccurate results
    -h, --help           Prints help information
    -V, --version        Prints version information

3. top

The top command of py-spy lets us print profiling results of CPU consumption to standard output in a table-like format. It’s almost like the top command of Unix which accepts process id and prints stack trace of it. Below we have given a common format of executing the top command of py-spy.

  • py-spy top [OPTIONS] --pid <pid> [python_program] <arg1> <arg2> ...

Example 1: Simple Profiling Example

As a part of our first example, we'll explain how we can use the top command with a simple python script. We have profiled the script from our first example again.

  • py-spy top python profiling_examples/example_2.py

OUTPUT:

Collecting samples from 'python profiling_examples/example_2.py' (python v3.7.3)
Total Samples 100
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
 15.56%  15.56%   0.150s    0.150s   _randbelow (random.py:234)
  7.78%  43.33%   0.100s    0.450s   randrange (random.py:198)
  7.78%  78.89%   0.080s    0.800s   randint (random.py:222)
  7.78%   7.78%   0.080s    0.080s   randrange (random.py:184)
  5.56%  42.22%   0.060s    0.420s   get_sum_of_list (example_2.py:11)
  6.67%   6.67%   0.060s    0.060s   _randbelow (random.py:232)
  5.56%   5.56%   0.060s    0.060s   randrange (random.py:194)
  4.44%   4.44%   0.050s    0.050s   _randbelow (random.py:235)
  5.56%   5.56%   0.050s    0.050s   randrange (random.py:193)
  5.56%  50.00%   0.050s    0.510s   get_sum_of_list (example_2.py:10)
  3.33%   3.33%   0.030s    0.030s   randrange (random.py:196)
  3.33%   3.33%   0.030s    0.030s   _randbelow (random.py:228)
  3.33%   4.44%   0.030s    0.040s   get_sum_of_list (example_2.py:12)
  2.22%   2.22%   0.020s    0.020s   _randbelow (random.py:229)
  2.22%   2.22%   0.020s    0.020s   get_sum_of_list (example_2.py:13)
  2.22%   2.22%   0.020s    0.020s   randrange (random.py:187)
  2.22%   2.22%   0.020s    0.020s   randrange (random.py:197)
  2.22%   2.22%   0.020s    0.020s   randint (random.py:218)
  1.11%   1.11%   0.010s    0.010s   get_sum_of_list (example_2.py:9)
  0.00%   0.00%   0.010s    0.010s   _randbelow (random.py:236)
  1.11%   1.11%   0.010s    0.010s   _randbelow (random.py:224)
  1.11%   1.11%   0.010s    0.010s   _randbelow (random.py:237)
  1.11%   1.11%   0.010s    0.010s   randrange (random.py:185)
  1.11%   1.11%   0.010s    0.010s   _randbelow (random.py:233)
  1.11%   1.11%   0.010s    0.010s   add (example_2.py:4)
  0.00% 100.00%   0.000s     1.00s   <module> (example_2.py:18)


Press Control-C to quit, or ? for help.

process 14274 ended

Example 2: Profile Process

The top command also lets us profile the running process. We have followed the same steps as we had followed earlier to profile the random numbers generator script. We have again started three shells and executed commands in each as explained below.

Shell 1 Command:

In this shell, we'll execute our python script which will run for around 30 seconds.

  • python profiling_examples/a_random_generator.py

OUTPUT:

50.62938
50.40982
50.38321

Shell 2 Command:

In the second shell, we'll try to get the process id of the running process.

  • ps aux | grep a_random_generator

OUTPUT:

sunny    14105  0.5  0.3  29256 12040 pts/1    S+   13:08   0:00 python profiling_examples/a_random_generator.py
sunny    14107  0.0  0.0  14428  1008 pts/3    S+   13:08   0:00 grep --color=auto a_random_generator

Shell 3 Command:

In the third shell, we'll use the top command of py-spy to monitor the process. Please make a note that we are using sudo as we'll need to run this command with root access.

  • sudo env "PATH=$PATH" py-spy top --pid 14105

OUTPUT:

Collecting samples from 'python profiling_examples/a_random_generator.py' (python v3.7.3)
Total Samples 2400
GIL: 0.00%, Active: 6.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  6.00%   6.00%   0.510s    0.510s   very_slow_random_generator (a_random_generator.py:5)
  0.00%   0.00%   0.340s    0.340s   slow_random_generator (a_random_generator.py:10)
  0.00%   0.00%   0.020s    0.050s   randrange (random.py:198)
  0.00%   0.00%   0.020s    0.020s   randrange (random.py:197)
  0.00%   0.00%   0.020s    0.020s   _randbelow (random.py:235)
  0.00%   0.00%   0.020s    0.120s   <listcomp> (a_random_generator.py:11)
  0.00%   0.00%   0.010s    0.010s   _randbelow (random.py:232)
  0.00%   0.00%   0.010s    0.010s   randrange (random.py:184)
  0.00%   0.00%   0.010s    0.100s   randint (random.py:222)
  0.00%   0.00%   0.010s    0.010s   randrange (random.py:196)
  0.00%   0.00%   0.000s    0.460s   main_func (a_random_generator.py:23)
  0.00%   6.00%   0.000s    0.970s   <module> (a_random_generator.py:29)
  0.00%   6.00%   0.000s    0.510s   main_func (a_random_generator.py:26)
  0.00%   0.00%   0.000s    0.120s   slow_random_generator (a_random_generator.py:11)



Press Control-C to quit, or ? for help.

process 14105 ended

Example 3: Multi-Processing Example

As apart of our third example, we are explaining the usage of top command for multi-processing example. We are again reusing the script from our previous example (Example 7). We have used --subprocesses option which instructs py-spy to monitor all sub-processes generated by our main program.

  • py-spy top --subprocesses python profiling_examples/multi_processing_example.py

OUTPUT:

Total Samples 500
GIL: 0.00%, Active: 1.00%, Threads: 6, Processes 4

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
  0.00%   0.00%   0.030s    0.030s   slow_random_generator (multi_processing_example.py:13)
  1.00%   1.00%   0.010s    0.010s   __enter__ (multiprocessing/synchronize.py:95)
  0.00%   0.00%   0.010s    0.010s   namedtuple (collections/__init__.py:464)
  0.00%   0.00%   0.010s    0.010s   <module> (shutil.py:11)
  0.00%   0.00%   0.010s    0.010s   create_module (<frozen importlib._bootstrap_external>:1043)
  0.00%   0.00%   0.010s    0.010s   _compile_bytecode (<frozen importlib._bootstrap_external>:525)
  0.00%   1.00%   0.000s    0.040s   <module> (multi_processing_example.py:34)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/lib/_datasource.py:42)
  0.00%   1.00%   0.000s    0.040s   run (multiprocessing/process.py:99)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/lib/npyio.py:15)
  0.00%   0.00%   0.000s    0.010s   <module> (hmac.py:15)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/testing/__init__.py:12)
  0.00%   0.00%   0.000s    0.010s   __get_openssl_constructor (hashlib.py:119)
  0.00%   0.00%   0.000s    0.040s   _load_unlocked (<frozen importlib._bootstrap>:677)
  0.00%   0.00%   0.000s    0.040s   exec_module (<frozen importlib._bootstrap_external>:728)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/core/_internal.py:11)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/__init__.py:150)
  0.00%   1.00%   0.000s    0.040s   _Popen (multiprocessing/context.py:277)
  0.00%   0.00%   0.000s    0.040s   _find_and_load (<frozen importlib._bootstrap>:983)
  0.00%   1.00%   0.000s    0.040s   _launch (multiprocessing/popen_fork.py:74)
  0.00%   0.00%   0.000s    0.030s   _handle_fromlist (<frozen importlib._bootstrap>:1035)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/__init__.py:142)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/__init__.py:145)
  0.00%   1.00%   0.000s    0.010s   _process_worker (concurrent/futures/process.py:226)
  0.00%   0.00%   0.000s    0.040s   _find_and_load_unlocked (<frozen importlib._bootstrap>:967)
  0.00%   1.00%   0.000s    0.040s   _start_queue_management_thread (concurrent/futures/process.py:569)
  0.00%   1.00%   0.000s    0.040s   _adjust_process_count (concurrent/futures/process.py:593)
  0.00%   0.00%   0.000s    0.010s   _load_unlocked (<frozen importlib._bootstrap>:670)
  0.00%   0.00%   0.000s    0.010s   <module> (platform.py:919)
  0.00%   0.00%   0.000s    0.010s   __get_builtin_constructor (hashlib.py:94)
  0.00%   0.00%   0.000s    0.010s   exec_module (<frozen importlib._bootstrap_external>:1050)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/lib/__init__.py:24)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/random/_pickle.py:1)
  0.00%   0.00%   0.000s    0.010s   <module> (secrets.py:19)
  0.00%   0.00%   0.000s    0.010s   exec_module (<frozen importlib._bootstrap_external>:724)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/__init__.py:187)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/testing/_private/utils.py:17)
  0.00%   0.00%   0.000s    0.010s   get_code (<frozen importlib._bootstrap_external>:857)
  0.00%   0.00%   0.000s    0.040s   _call_with_frames_removed (<frozen importlib._bootstrap>:219)
  0.00%   1.00%   0.000s    0.040s   _bootstrap (multiprocessing/process.py:297)
  0.00%   1.00%   0.000s    0.040s   submit (concurrent/futures/process.py:615)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/random/__init__.py:181)
  0.00%   1.00%   0.000s    0.040s   main_func (multi_processing_example.py:27)
  0.00%   0.00%   0.000s    0.010s   <module> (hashlib.py:244)
  0.00%   0.00%   0.000s    0.040s   <module> (multi_processing_example.py:2)
  0.00%   0.00%   0.000s    0.010s   module_from_spec (<frozen importlib._bootstrap>:583)
  0.00%   1.00%   0.000s    0.040s   __init__ (multiprocessing/popen_fork.py:20)
  0.00%   1.00%   0.000s    0.010s   get (multiprocessing/queues.py:93)
  0.00%   0.00%   0.000s    0.010s   <module> (numpy/core/__init__.py:100)

Press Control-C to quit, or ? for help.

Other Useful Options

  • -r <rate> or --rate <rate> - It lets us change the rate at which sampling happens. Example 4 of the record option explains how we can use this option.
  • -n or --native - It lets us profile native extensions written in C, C++, or Cython.
  • --nonblocking - It let us profile python program without interrupting running program as explained in record command example (Example 9).

top Command Help Doc

We can access the helper doc as mentioned below.

In [33]:
!py-spy top --help
py-spy-top
Displays a top like view of functions consuming CPU

USAGE:
    py-spy top [OPTIONS] --pid <pid> [python_program]...

OPTIONS:
    -p, --pid <pid>       PID of a running python program to spy on
    -r, --rate <rate>     The number of samples to collect per second [default:
                          100]
    -s, --subprocesses    Profile subprocesses of the original process
    -n, --native          Collect stack traces from native extensions written in
                          Cython, C or C++
        --nonblocking     Don't pause the python process when collecting
                          samples. Setting this option will reduce the
                          perfomance impact of sampling, but may lead to
                          inaccurate results
    -h, --help            Prints help information
    -V, --version         Prints version information

ARGS:
    <python_program>...    commandline of a python program to run


Sunny Solanki  Sunny Solanki