Share @ LinkedIn Facebook  profiling, memory_profiler
How to Profile Memory Usage in Python using memory_profiler?

How to Profile Memory Usage in Python using memory_profiler?

With the rise in the primary memory of computer systems, we generally do not run out of memory. This is the reason we generally do no monitor primary memory usage. But due to the rise in data over time, it sometimes happens that we run out of memory and need to monitor which part of the code is using how much of memory to further understand how we can avoid it from happening again in the future. The python has been the go-to language nowadays for data analysis of big datasets. It also has libraries like cProfile, profile, line_profiler, etc which can be used to analyze the running time of various parts of code but it does not provide any information on memory usage.

The Python has many libraries which provide memory usage report like memory_profiler, memprof, guppy/hpy, etc. We'll be concentrating on how to use memory_profiler to analyze memory usage of python code as a part of this tutorial.

  • memory_profiler: It's written totally in python and monitors process which is running python code as well as line by line memory usage by code. It's work almost like line_profiler which profiles time. It's built on top of psutil module of python.

It can be easily installed using pip/conda.

memory_profiler

In [1]:
import memory_profiler

Example 1 : @profile

As a part of our first example, we'll explain how we can decorate any function of code with the @profile decorator of memory_profiler and then record memory usage of that function by running script along with profiler.

Below we have created a simple script with one function which generates a list of 100000 random numbers between 1-10. It then adds index-wise elements of both list and keep it in the third list. It then sums up third list elements and returns the total. We have decorated function with @profile decorator to monitor memory usage in it.

example1.py

from memory_profiler import profile

@profile
def main_func():
    import random
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    del arr1
    del arr2
    tot = sum(arr3)
    del arr3
    print(tot)

if __name__ == "__main__":
    main_func()

We can now run the below-mentioned code into the command prompt in order to analyze memory usage of the function.

  • python -m memory_profiler example1.py

Output

1098330

Filename: example1.py

Line #    Mem usage    Increment   Line Contents
================================================
     3     37.0 MiB     37.0 MiB   @profile
     4                             def main_func():
     5     37.0 MiB      0.0 MiB       import random
     6     37.6 MiB      0.3 MiB       arr1 = [random.randint(1,10) for i in range(100000)]
     7     38.4 MiB      0.3 MiB       arr2 = [random.randint(1,10) for i in range(100000)]
     8     39.9 MiB      0.5 MiB       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9     39.9 MiB      0.0 MiB       del arr1
    10     38.0 MiB      0.0 MiB       del arr2
    11     38.0 MiB      0.0 MiB       tot = sum(arr3)
    12     37.2 MiB      0.0 MiB       del arr3
    13     37.2 MiB      0.0 MiB       print(tot)

The output generated using the memory profiler is very self-explanatory. It shows two columns named Mem usage and Increment next to each line of function which is decorated with @profile. We can see that it starts with some memory and then increases memory as arrays are created and decrease memory as an array are deallocated. The total memory usage at any time will be displayed in the Mem usage column and increment in memory usage due to the execution of a particular statement in the Increment column. This gives us the best idea of how much memory in total is getting used and how much a particular variable is using for better decision making.

Below we have modified the example1.py file by removing the del statement which was deleting unused variables and then has profiled code.

example1_modified.py

from memory_profiler import profile

@profile
def main_func():
    import random
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    tot = sum(arr3)
    print(tot)

if __name__ == "__main__":
    main_func()

We'll be running the example1_modified.py file through the profiler exactly the same way as the previous file.

  • python -m memory_profiler example1_modified.py

Output

1098330

Filename: example1_modified.py

Line #    Mem usage    Increment   Line Contents
================================================
     3     37.1 MiB     37.1 MiB   @profile
     4                             def main_func():
     5     37.1 MiB      0.0 MiB       import random
     6     37.9 MiB      0.3 MiB       arr1 = [random.randint(1,10) for i in range(100000)]
     7     38.4 MiB      0.3 MiB       arr2 = [random.randint(1,10) for i in range(100000)]
     8     40.0 MiB      0.5 MiB       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9     40.0 MiB      0.0 MiB       tot = sum(arr3)
    10     40.0 MiB      0.0 MiB       print(tot)

We can clearly see from the output of the modified file that it uses more memory as we are not deallocating memory used by arr1, arr2, and arr3 after their usage is done. This kind of unused variables can pile up over time and flood memory with unused data which is not needed anymore. We can use memory_profiler to find out such code.

We can even modify the precision of memory usage getting displayed in two columns by giving precision parameter with value to the @profile decorator.

example1_modified.py

from memory_profiler import profile

@profile(precision=4)
def main_func():
    import random
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    tot = sum(arr3)
    print(tot)

if __name__ == "__main__":
    main_func()

Output

1100229
Filename: example1_modified.py

Line #    Mem usage    Increment   Line Contents
================================================
     3  37.0469 MiB  37.0469 MiB   @profile(precision=4)
     4                             def main_func():
     5  37.0469 MiB   0.0000 MiB       import random
     6  37.6680 MiB   0.2578 MiB       arr1 = [random.randint(1,10) for i in range(100000)]
     7  38.4414 MiB   0.2578 MiB       arr2 = [random.randint(1,10) for i in range(100000)]
     8  39.9570 MiB   0.5156 MiB       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9  39.9570 MiB   0.0000 MiB       tot = sum(arr3)
    10  39.9570 MiB   0.0000 MiB       print(tot)

Example 2 : mprof

When you install memory_profiler, it also gives us access to mprof executable from the command line/shell. The mprof provides us information about the usage of memory overtime of the execution of the script. This can be very useful for analysis to understand in which part of the script is taking more memory as well as when the memory usage is rising. The mprof records memory usage at each timestamp when the script is running and stores it in a .dat file. It also provides plotting functionality which plots memory usage as a function of time using matplotlib.

We'll explain further how we can use mprof command to get more insights into memory usage over time.

We have created a simple script below which has three functions each having the same code for generating a random array of size 1000x1000. Each function stops for some amount of time, generates a random array, takes a mean of an array, and returns it. We'll use this script to explain the usage of mprof.

random_number_generator.py

import time
import numpy as np

@profile
def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile
def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile
def fast_random_generator():
    time.sleep(1)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def main_func():
    fast_random_generator()
    slow_random_generator()
    very_slow_random_generator()

if __name__ == '__main__':
    main_func()

We can run the above script through the mprof command as explained below.

  • mprof run random_number_generator.py

The above command will execute the script and generate the new file by name mprofile_20201006095840.dat.

We can now call plotting functionality using the below command to plot usage of memory.

  • mprof plot

The above command will take the latest .dat file generated and plot it using matplotlib. We can also different file name which was previously generated after mprof plot <FILENAME> to generate a plot of that file. We can also give a title to a plot of our choice by using -t option as explained below.

  • mprof plot -t 'Random Number Generator Memory Footprints'

The plot for our random_number_generator.py file looks like this:

How to Profile Memory Usage in Python using memory_profiler & memprof?

NOTE

Please make a note of brackets shown highlighting function start and end. We'll not get this brackets if we don't decorate functions using `@profile`. We'll still get graph for usage of memory overtime.

Below are some other useful commands available with mprof:

  • mprof list - It'll list all .dat files generated by mprof.
  • mprof clean - It'll clean all .dat files.
  • mprof rm - It can be useful to remove any particular .dat file.

Example 3: mprof with multiprocessing

The mprof command also provides memory usage monitoring in the context of multiprocessing. It provides two options for monitoring memory usage in case of multiprocessing.

  • --include-children - It monitors memory usage across all children of process and shows their usage as one line chart.
  • --multiprocess - It generates a sample line chart for each sub-process and their memory usage per time.

Below we have modified out python code from the previous examples to show usage of this option. We are creating a multiprocessing pool and submitting three functions for generating an average of random numbers to it. We'll be monitoring memory usage in each process using mprof.

multi_processing_example.py

import time
import numpy as np
from concurrent.futures import ProcessPoolExecutor

def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def fast_random_generator():
    time.sleep(1)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def main_func():
    ppe = ProcessPoolExecutor(max_workers=3)
    futures = []

    futures.append(ppe.submit(fast_random_generator))
    futures.append(ppe.submit(slow_random_generator))
    futures.append(ppe.submit(very_slow_random_generator))

    print([future.result() for future in futures])

if __name__ == '__main__':
    main_func()

We'll execute the below command to generate a memory usage file for multi_processing_example.py.

  • mprof run --multiprocess multi_processing_example.py

We'll then plot it using the below command.

  • mprof plot

Output Plot

How to Profile Memory Usage in Python using memory_profiler & memprof?

Example 4 : memory_usage()

The memory_profiler another important method named memory_usage() which can be called inside python to check memory usage of any python statement or function over time. We need to provide is statement/function with parameters and intervals at which to measure memory usage.

We can call the memory_usage() method with -1 as the first parameter and it'll monitor the memory usage of the current process. We have specified other parameter interval as 0.2 means that measure memory usage every 0.2 seconds and timeout as 1-second meaning that stop measuring after 1 second and return results. Below we are monitoring memory usage of the current process which is a process running jupyter notebook ipython kernel.

In [2]:
from memory_profiler import memory_usage
In [3]:
mem_usage = memory_usage(-1, interval=.2, timeout=1)
In [4]:
mem_usage
Out[4]:
[46.0859375, 46.0859375, 46.0859375, 46.0859375, 46.0859375]

We can also pass a function to memory_usage() as explained below. We have redefined the function very_slow_random_generator from our previous examples. We are calling it with an interval of 1 second. We have also set timestamps to True so that it'll return timestamps at which memory usage was recorded as well.

In [5]:
import time
import numpy as np

def very_slow_random_generator(sz=1000):
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(sz, sz))
    avg = arr1.mean()
    return avg
In [6]:
mem_usage = memory_usage((very_slow_random_generator, (10000,), ), timestamps=True, interval=1)
mem_usage
Out[6]:
[(59.11328125, 1601967777.2364237),
 (59.24609375, 1601967777.2937305),
 (59.48828125, 1601967778.295808),
 (59.48828125, 1601967779.2976193),
 (59.48828125, 1601967780.2993605),
 (59.48828125, 1601967781.3007677),
 (64.390625, 1601967782.3020203),
 (440.4921875, 1601967783.3033137),
 (568.3671875, 1601967784.3045638),
 (632.14453125, 1601967785.305819),
 (705.11328125, 1601967786.3070426),
 (736.3203125, 1601967787.3082716),
 (810.85546875, 1601967788.3100257),
 (584.8984375, 1601967789.3114278),
 (53.375, 1601967789.447128)]

The memory_usage() function lets us measure memory usage in a multiprocessing environment like mprof command but from code directly rather than from command prompt/shell like mprof. It provides both option include_children and multiprocess which were available in mprof command.

We have regenerated code from our previous examples on multiprocessing and used memory_usage() to measure memory usage in the multiprocessing examples. We have tried both include_children and multiprocess parameters.

In [7]:
import time
import numpy as np
from concurrent.futures import ProcessPoolExecutor

def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def fast_random_generator():
    time.sleep(1)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

def main_func():
    ppe = ProcessPoolExecutor(max_workers=3)
    futures = []

    futures.append(ppe.submit(fast_random_generator))
    futures.append(ppe.submit(slow_random_generator))
    futures.append(ppe.submit(very_slow_random_generator))

    print([future.result() for future in futures])

mem_usage = memory_usage((main_func,), interval=1, include_children=True)
mem_usage
[49.975029, 49.975029, 49.975029]
Out[7]:
[54.828125,
 98.09765625,
 227.92578125,
 227.92578125,
 233.98046875,
 233.98046875,
 233.98046875,
 99.0546875]
In [8]:
mem_usage = memory_usage((main_func,), interval=1, multiprocess=True)
mem_usage
[49.975029, 49.975029, 49.975029]
Out[8]:
[55.74609375,
 55.74609375,
 56.96484375,
 56.96484375,
 56.96484375,
 56.96484375,
 56.96484375,
 56.96484375]

Example 5: Streaming Output to Log File

The memory_profiler also lets us relocate the output of profiling to any log file. This can be useful when we have too many functions to profile and the output of profilers can flood standard output. In this kind of scenario, it’s better to output profiling results to the log file. We can simply pass the file pointer to the stream parameter of the @profile decorator of a method and it'll redirect the profiling result for that function to log file of the stream.

Below we have regenerated our previous example with streaming profiling results to the report.log file.

import time
import numpy as np
from memory_profiler import profile

fp = open("report.log", "w+")

@profile(stream = fp)
def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile(stream = fp)
def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile(stream = fp)
def fast_random_generator():
    time.sleep(1)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile(stream = fp)
def main_func():
    fast_random_generator()
    slow_random_generator()
    very_slow_random_generator()

if __name__ == '__main__':
    main_func()

We can run the above file as per the below command and the output won't be printed in standard out but will be directed to the log file.

  • python -m memory_profiler random_number_generator.py

    Below we can see the contents of the report.log file.

In [9]:
!cat report.log
Filename: random_number_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    21     50.4 MiB     50.4 MiB   @profile(stream = fp)
    22                             def fast_random_generator():
    23     50.4 MiB      0.0 MiB       time.sleep(1)
    24     58.0 MiB      7.6 MiB       arr1 = np.random.randint(1,100, size=(1000,1000))
    25     58.2 MiB      0.3 MiB       avg = arr1.mean()
    26     58.2 MiB      0.0 MiB       return avg


Filename: random_number_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    14     50.7 MiB     50.7 MiB   @profile(stream = fp)
    15                             def slow_random_generator():
    16     50.7 MiB      0.0 MiB       time.sleep(2)
    17     58.2 MiB      7.5 MiB       arr1 = np.random.randint(1,100, size=(1000,1000))
    18     58.2 MiB      0.0 MiB       avg = arr1.mean()
    19     58.2 MiB      0.0 MiB       return avg


Filename: random_number_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
     7     58.2 MiB     58.2 MiB   @profile(stream = fp)
     8                             def very_slow_random_generator():
     9     58.2 MiB      0.0 MiB       time.sleep(5)
    10     58.2 MiB      0.0 MiB       arr1 = np.random.randint(1,100, size=(1000,1000))
    11     58.2 MiB      0.0 MiB       avg = arr1.mean()
    12     58.2 MiB      0.0 MiB       return avg


Filename: random_number_generator.py

Line #    Mem usage    Increment   Line Contents
================================================
    28     50.4 MiB     50.4 MiB   @profile(stream = fp)
    29                             def main_func():
    30     50.7 MiB      0.3 MiB       fast_random_generator()
    31     58.2 MiB      7.5 MiB       slow_random_generator()
    32     58.2 MiB      0.0 MiB       very_slow_random_generator()


Example 6: Jupyter Notebook

We can load memory_profiler as an external extension in python jupyter notebook to measure memory usage of various functions and code. We can load memory_profiler in the jupyter notebook with the below command.

In [10]:
%load_ext memory_profiler

The memory_profiler provides 2 line magic commands and 2 cell magic commands to be used in jupyter notebooks.

  • Line Magic Commands : %mprun & %memit
  • Cell Magic Commands : %%mprun & %%memit

The mprun commands return the same output as that of calling memory_profiler from the command line. It'll open the output in a separate window in the jupyter notebook.

The memit command returns peak memory used by a line of code in the cell.

Below we are loading the very_slow_random_generator function from the random_number_generator.py file which we created in our previous example. We are then calling the %mprun command on it.

In [20]:
from random_number_generator import very_slow_random_generator

%mprun -f very_slow_random_generator very_slow_random_generator()

Line #    Mem usage    Increment   Line Contents
Filename: /home/sunny/anaconda3/lib/python3.7/site-packages/memory_profiler.py

Line #    Mem usage    Increment   Line Contents
================================================
  1110     82.1 MiB     82.1 MiB           @wraps(func)
  1111                                     def wrapper(*args, **kwargs):
  1112     82.1 MiB      0.0 MiB               prof = LineProfiler(backend=backend)
  1113     82.1 MiB      0.0 MiB               val = prof(func)(*args, **kwargs)
  1114     82.1 MiB      0.0 MiB               show_results(prof, stream=stream, precision=precision)
  1115     82.1 MiB      0.0 MiB               return val

Below we are loading the function very_slow_random_generator as a cell function and whenever it'll be called in a cell that many time memory profiling will be recorded.

In [21]:
%%mprun -f very_slow_random_generator

very_slow_random_generator()

Line #    Mem usage    Increment   Line Contents
Filename: /home/sunny/anaconda3/lib/python3.7/site-packages/memory_profiler.py

Line #    Mem usage    Increment   Line Contents
================================================
  1110     82.1 MiB     82.1 MiB           @wraps(func)
  1111                                     def wrapper(*args, **kwargs):
  1112     82.1 MiB      0.0 MiB               prof = LineProfiler(backend=backend)
  1113     82.1 MiB      0.0 MiB               val = prof(func)(*args, **kwargs)
  1114     82.1 MiB      0.0 MiB               show_results(prof, stream=stream, precision=precision)
  1115     82.1 MiB      0.0 MiB               return val

Below we are explaining how we can use the memit command to measure peak memory usage in a function.

In [15]:
%memit very_slow_random_generator()
peak memory: 82.10 MiB, increment: 0.02 MiB
NOTE

Please make a note that memory_profiler generates memory consumption by querying underlying operating system kernel which is bit different from python interpreter. It uses psutil module for retrieving memory allocated by current process running code. Apart from that, based on python garbage collection, results might be different on different platforms or between different runs of same code.

This ends our small tutorial explaining various ways to measure memory usage using memory_profiler. Please feel free to let us know your views in the comments section.

Reference



Sunny Solanki  Sunny Solanki