memory_profiler
?¶With the rise in the primary memory of computer systems, we generally do not run out of memory. This is the reason we generally do no monitor primary memory usage. But due to the rise in data over time, it sometimes happens that we run out of memory and need to monitor which part of the code is using how much of memory to further understand how we can avoid it from happening again in the future. The python has been the go-to language nowadays for data analysis of big datasets. It also has libraries like cProfile
, profile
, line_profiler
, etc which can be used to analyze the running time of various parts of code but it does not provide any information on memory usage.
The Python has many libraries which provide memory usage report like memory_profiler
, memprof
, guppy/hpy
, etc. We'll be concentrating on how to use memory_profiler
to analyze memory usage of python code as a part of this tutorial.
line_profiler
which profiles time. It's built on top of psutil
module of python.It can be easily installed using pip/conda
.
memory_profiler
¶import memory_profiler
@profile
¶As a part of our first example, we'll explain how we can decorate any function of code with the @profile
decorator of memory_profiler
and then record memory usage of that function by running script along with profiler.
Below we have created a simple script with one function which generates a list of 100000 random numbers between 1-10. It then adds index-wise elements of both list and keep it in the third list. It then sums up third list elements and returns the total. We have decorated function with @profile
decorator to monitor memory usage in it.
example1.py
from memory_profiler import profile
@profile
def main_func():
import random
arr1 = [random.randint(1,10) for i in range(100000)]
arr2 = [random.randint(1,10) for i in range(100000)]
arr3 = [arr1[i]+arr2[i] for i in range(100000)]
del arr1
del arr2
tot = sum(arr3)
del arr3
print(tot)
if __name__ == "__main__":
main_func()
We can now run the below-mentioned code into the command prompt in order to analyze memory usage of the function.
python -m memory_profiler example1.py
Output
1098330
Filename: example1.py
Line # Mem usage Increment Line Contents
================================================
3 37.0 MiB 37.0 MiB @profile
4 def main_func():
5 37.0 MiB 0.0 MiB import random
6 37.6 MiB 0.3 MiB arr1 = [random.randint(1,10) for i in range(100000)]
7 38.4 MiB 0.3 MiB arr2 = [random.randint(1,10) for i in range(100000)]
8 39.9 MiB 0.5 MiB arr3 = [arr1[i]+arr2[i] for i in range(100000)]
9 39.9 MiB 0.0 MiB del arr1
10 38.0 MiB 0.0 MiB del arr2
11 38.0 MiB 0.0 MiB tot = sum(arr3)
12 37.2 MiB 0.0 MiB del arr3
13 37.2 MiB 0.0 MiB print(tot)
The output generated using the memory profiler is very self-explanatory. It shows two columns named Mem usage
and Increment
next to each line of function which is decorated with @profile
. We can see that it starts with some memory and then increases memory as arrays are created and decrease memory as an array are deallocated. The total memory usage at any time will be displayed in the Mem usage
column and increment in memory usage due to the execution of a particular statement in the Increment
column. This gives us the best idea of how much memory in total is getting used and how much a particular variable is using for better decision making.
Below we have modified the example1.py
file by removing the del
statement which was deleting unused variables and then has profiled code.
example1_modified.py
from memory_profiler import profile
@profile
def main_func():
import random
arr1 = [random.randint(1,10) for i in range(100000)]
arr2 = [random.randint(1,10) for i in range(100000)]
arr3 = [arr1[i]+arr2[i] for i in range(100000)]
tot = sum(arr3)
print(tot)
if __name__ == "__main__":
main_func()
We'll be running the example1_modified.py
file through the profiler exactly the same way as the previous file.
python -m memory_profiler example1_modified.py
Output
1098330
Filename: example1_modified.py
Line # Mem usage Increment Line Contents
================================================
3 37.1 MiB 37.1 MiB @profile
4 def main_func():
5 37.1 MiB 0.0 MiB import random
6 37.9 MiB 0.3 MiB arr1 = [random.randint(1,10) for i in range(100000)]
7 38.4 MiB 0.3 MiB arr2 = [random.randint(1,10) for i in range(100000)]
8 40.0 MiB 0.5 MiB arr3 = [arr1[i]+arr2[i] for i in range(100000)]
9 40.0 MiB 0.0 MiB tot = sum(arr3)
10 40.0 MiB 0.0 MiB print(tot)
We can clearly see from the output of the modified file that it uses more memory as we are not deallocating memory used by arr1
, arr2
, and arr3
after their usage is done. This kind of unused variables can pile up over time and flood memory with unused data which is not needed anymore. We can use memory_profiler
to find out such code.
We can even modify the precision of memory usage getting displayed in two columns by giving precision parameter with value to the @profile
decorator.
example1_modified.py
from memory_profiler import profile
@profile(precision=4)
def main_func():
import random
arr1 = [random.randint(1,10) for i in range(100000)]
arr2 = [random.randint(1,10) for i in range(100000)]
arr3 = [arr1[i]+arr2[i] for i in range(100000)]
tot = sum(arr3)
print(tot)
if __name__ == "__main__":
main_func()
Output
1100229
Filename: example1_modified.py
Line # Mem usage Increment Line Contents
================================================
3 37.0469 MiB 37.0469 MiB @profile(precision=4)
4 def main_func():
5 37.0469 MiB 0.0000 MiB import random
6 37.6680 MiB 0.2578 MiB arr1 = [random.randint(1,10) for i in range(100000)]
7 38.4414 MiB 0.2578 MiB arr2 = [random.randint(1,10) for i in range(100000)]
8 39.9570 MiB 0.5156 MiB arr3 = [arr1[i]+arr2[i] for i in range(100000)]
9 39.9570 MiB 0.0000 MiB tot = sum(arr3)
10 39.9570 MiB 0.0000 MiB print(tot)
mprof
¶When you install memory_profiler
, it also gives us access to mprof
executable from the command line/shell. The mprof
provides us information about the usage of memory overtime of the execution of the script. This can be very useful for analysis to understand in which part of the script is taking more memory as well as when the memory usage is rising. The mprof
records memory usage at each timestamp when the script is running and stores it in a .dat
file. It also provides plotting functionality which plots memory usage as a function of time using matplotlib
.
We'll explain further how we can use mprof
command to get more insights into memory usage over time.
We have created a simple script below which has three functions each having the same code for generating a random array of size 1000x1000
. Each function stops for some amount of time, generates a random array, takes a mean of an array, and returns it. We'll use this script to explain the usage of mprof
.
random_number_generator.py
import time
import numpy as np
@profile
def very_slow_random_generator():
time.sleep(5)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
@profile
def slow_random_generator():
time.sleep(2)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
@profile
def fast_random_generator():
time.sleep(1)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def main_func():
fast_random_generator()
slow_random_generator()
very_slow_random_generator()
if __name__ == '__main__':
main_func()
We can run the above script through the mprof
command as explained below.
mprof run random_number_generator.py
The above command will execute the script and generate the new file by name mprofile_20201006095840.dat
.
We can now call plotting functionality using the below command to plot usage of memory.
mprof plot
The above command will take the latest .dat
file generated and plot it using matplotlib
. We can also different file name which was previously generated after mprof plot <FILENAME>
to generate a plot of that file. We can also give a title to a plot of our choice by using -t
option as explained below.
mprof plot -t 'Random Number Generator Memory Footprints'
The plot for our random_number_generator.py
file looks like this:
Please make a note of brackets shown highlighting function start and end. We'll not get this brackets if we don't decorate functions using `@profile`. We'll still get graph for usage of memory overtime.
Below are some other useful commands available with mprof
:
mprof list
- It'll list all .dat
files generated by mprof
.mprof clean
- It'll clean all .dat
files.mprof rm
- It can be useful to remove any particular .dat
file.mprof
with multiprocessing¶The mprof
command also provides memory usage monitoring in the context of multiprocessing. It provides two options for monitoring memory usage in case of multiprocessing.
--include-children
- It monitors memory usage across all children of process and shows their usage as one line chart.--multiprocess
- It generates a sample line chart for each sub-process and their memory usage per time.Below we have modified out python code from the previous examples to show usage of this option. We are creating a multiprocessing pool and submitting three functions for generating an average of random numbers to it. We'll be monitoring memory usage in each process using mprof
.
multi_processing_example.py
import time
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def very_slow_random_generator():
time.sleep(5)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def slow_random_generator():
time.sleep(2)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def fast_random_generator():
time.sleep(1)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def main_func():
ppe = ProcessPoolExecutor(max_workers=3)
futures = []
futures.append(ppe.submit(fast_random_generator))
futures.append(ppe.submit(slow_random_generator))
futures.append(ppe.submit(very_slow_random_generator))
print([future.result() for future in futures])
if __name__ == '__main__':
main_func()
We'll execute the below command to generate a memory usage file for multi_processing_example.py
.
mprof run --multiprocess multi_processing_example.py
We'll then plot it using the below command.
mprof plot
Output Plot
memory_usage()
¶The memory_profiler
another important method named memory_usage()
which can be called inside python to check memory usage of any python statement or function over time. We need to provide is statement/function with parameters and intervals at which to measure memory usage.
We can call the memory_usage()
method with -1
as the first parameter and it'll monitor the memory usage of the current process. We have specified other parameter interval
as 0.2
means that measure memory usage every 0.2 seconds and timeout
as 1-second meaning that stop measuring after 1 second and return results. Below we are monitoring memory usage of the current process which is a process running jupyter notebook ipython kernel.
from memory_profiler import memory_usage
mem_usage = memory_usage(-1, interval=.2, timeout=1)
mem_usage
We can also pass a function to memory_usage()
as explained below. We have redefined the function very_slow_random_generator
from our previous examples. We are calling it with an interval of 1 second. We have also set timestamps
to True so that it'll return timestamps at which memory usage was recorded as well.
import time
import numpy as np
def very_slow_random_generator(sz=1000):
time.sleep(5)
arr1 = np.random.randint(1,100, size=(sz, sz))
avg = arr1.mean()
return avg
mem_usage = memory_usage((very_slow_random_generator, (10000,), ), timestamps=True, interval=1)
mem_usage
The memory_usage()
function lets us measure memory usage in a multiprocessing environment like mprof
command but from code directly rather than from command prompt/shell like mprof
. It provides both option include_children
and multiprocess
which were available in mprof
command.
We have regenerated code from our previous examples on multiprocessing and used memory_usage()
to measure memory usage in the multiprocessing examples. We have tried both include_children
and multiprocess
parameters.
import time
import numpy as np
from concurrent.futures import ProcessPoolExecutor
def very_slow_random_generator():
time.sleep(5)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def slow_random_generator():
time.sleep(2)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def fast_random_generator():
time.sleep(1)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
def main_func():
ppe = ProcessPoolExecutor(max_workers=3)
futures = []
futures.append(ppe.submit(fast_random_generator))
futures.append(ppe.submit(slow_random_generator))
futures.append(ppe.submit(very_slow_random_generator))
print([future.result() for future in futures])
mem_usage = memory_usage((main_func,), interval=1, include_children=True)
mem_usage
mem_usage = memory_usage((main_func,), interval=1, multiprocess=True)
mem_usage
The memory_profiler
also lets us relocate the output of profiling to any log file. This can be useful when we have too many functions to profile and the output of profilers can flood standard output. In this kind of scenario, it’s better to output profiling results to the log file. We can simply pass the file pointer to the stream
parameter of the @profile
decorator of a method and it'll redirect the profiling result for that function to log file of the stream.
Below we have regenerated our previous example with streaming profiling results to the report.log
file.
import time
import numpy as np
from memory_profiler import profile
fp = open("report.log", "w+")
@profile(stream = fp)
def very_slow_random_generator():
time.sleep(5)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
@profile(stream = fp)
def slow_random_generator():
time.sleep(2)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
@profile(stream = fp)
def fast_random_generator():
time.sleep(1)
arr1 = np.random.randint(1,100, size=(1000,1000))
avg = arr1.mean()
return avg
@profile(stream = fp)
def main_func():
fast_random_generator()
slow_random_generator()
very_slow_random_generator()
if __name__ == '__main__':
main_func()
We can run the above file as per the below command and the output won't be printed in standard out but will be directed to the log file.
python -m memory_profiler random_number_generator.py
Below we can see the contents of the report.log
file.
!cat report.log
We can load memory_profiler
as an external extension in python jupyter notebook to measure memory usage of various functions and code. We can load memory_profiler
in the jupyter notebook with the below command.
%load_ext memory_profiler
The memory_profiler
provides 2 line magic commands and 2 cell magic commands to be used in jupyter notebooks.
%mprun & %memit
%%mprun & %%memit
The mprun
commands return the same output as that of calling memory_profiler
from the command line. It'll open the output in a separate window in the jupyter notebook.
The memit
command returns peak memory used by a line of code in the cell.
Below we are loading the very_slow_random_generator
function from the random_number_generator.py
file which we created in our previous example. We are then calling the %mprun
command on it.
from random_number_generator import very_slow_random_generator
%mprun -f very_slow_random_generator very_slow_random_generator()
Line # Mem usage Increment Line Contents
Filename: /home/sunny/anaconda3/lib/python3.7/site-packages/memory_profiler.py
Line # Mem usage Increment Line Contents
================================================
1110 82.1 MiB 82.1 MiB @wraps(func)
1111 def wrapper(*args, **kwargs):
1112 82.1 MiB 0.0 MiB prof = LineProfiler(backend=backend)
1113 82.1 MiB 0.0 MiB val = prof(func)(*args, **kwargs)
1114 82.1 MiB 0.0 MiB show_results(prof, stream=stream, precision=precision)
1115 82.1 MiB 0.0 MiB return val
Below we are loading the function very_slow_random_generator
as a cell function and whenever it'll be called in a cell that many time memory profiling will be recorded.
%%mprun -f very_slow_random_generator
very_slow_random_generator()
Line # Mem usage Increment Line Contents
Filename: /home/sunny/anaconda3/lib/python3.7/site-packages/memory_profiler.py
Line # Mem usage Increment Line Contents
================================================
1110 82.1 MiB 82.1 MiB @wraps(func)
1111 def wrapper(*args, **kwargs):
1112 82.1 MiB 0.0 MiB prof = LineProfiler(backend=backend)
1113 82.1 MiB 0.0 MiB val = prof(func)(*args, **kwargs)
1114 82.1 MiB 0.0 MiB show_results(prof, stream=stream, precision=precision)
1115 82.1 MiB 0.0 MiB return val
Below we are explaining how we can use the memit
command to measure peak memory usage in a function.
%memit very_slow_random_generator()
Please make a note that memory_profiler generates memory consumption by querying underlying operating system kernel which is bit different from python interpreter. It uses psutil module for retrieving memory allocated by current process running code. Apart from that, based on python garbage collection, results might be different on different platforms or between different runs of same code.
This ends our small tutorial explaining various ways to measure memory usage using memory_profiler
. Please feel free to let us know your views in the comments section.
Thank You for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the Coffee button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.
If you want to