Memory management in python refers to the allocation and deallocation of objects. It's handled by Python's Garbage Collection Algorithm. We'll try to understand the basics of how memory management works in Python. Memory management in Python is done automatically by the Interpreter and the developer does not need to take any steps. Though Python still provides a module called
gc for explicitly calling Garbage Collector in some cases to tell it explicitly collect objects rather than waiting for automatic behavior. Memory management also helps write better and efficient code to developers.
Python variables are different from other languages like C/C++/Java in that they are just references to Object. They do not have any data type attached to them. Hence you can assign any type of object to Python variable. Objects can have more than one reference pointing to it. Reference count generally refers to a number of references pointing to object. One can use
id() function provided by Python to find out the memory location of the python variable.
Every Python ojbects maintains 3 things internally.
Type- Type of object
Value- Value of Object
RefCount- Reference Count
Python provides simple objects like a number, string and container objects like list,dict, user-defined classes, etc.
Let’s define a few variables and find out their memory location using
import sys print(sys.version)
3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0]
x = 10 y = x t = 10 print(id(x), id(y), id(t))
94008858372928 94008858372928 94008858372928
Notice above that integer object with value 10 is created and variables
x,y,t are all pointing to it. Hence it's reference count is currently 3. Also notice that
id() function returns same value for all 3 references.
z = 'hello' l = 'hello' k = l i = 'Hello World' j = 'Hello World' print(id(z), id(l), id(k), id(i), id(j))
140707529744992 140707529744992 140707529744992 140706718953200 140706719614000
Notice above that first string is creating the same id which means that both are pointing to the same objects. Wheres in case of string with space in it is getting created as two separate objects. One can see it from
id() output above.
Let’s create a few container objects below.
lst = ['hello',10, 'Hello World'] print(id(lst), id(lst), id(lst))
140707529744992 94008858372928 140706719613680
Note above that id of first and 2nd object of the list is the same as ids mentioned above for those objects. But for 3rd object, it created a new copy of that object. Now integer object with value 10 will have 4 references including 3 defined above and one mentioned in the list. The same will be for string
hello which will also have 4 reference count.
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'i', 'j', 'k', 'l', 'lst', 'quit', 'sys', 't', 'x', 'y', 'z']
There are 3 ways to decrease the reference count of an object:
Noneto a variable name.
del x x
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-1beca76b84e9> in <module>() 1 del x ----> 2 x NameError: name 'x' is not defined
y, t, id(y), id(t)
(10, 10, 94008858372928, 94008858372928)
Note above that even though we deleted reference x from memory. Integer object with value 10 is still accessible through
t both. When we above pointed
x, it pointed to integer object with value 10. Also, note that
id is still same.
When you call
del on objects created from user-defined classes then it'll call their
__del__() method if it's implemented in class.
Let’s try to use
y = None y
t, lst, id(t), id(lst)
(10, 10, 94008858372928, 94008858372928)
Please make a note that calling
del statement on reference name or pointing reference to
None does not actually delete the object. It just removes that reference variable name as a reference to that object which means that object will no longer be accessible through that reference.
del statement or setting variable to
None only decreases the reference count of that object by 1. The pointing reference variable to some other object has the same effect on reference count.
Note: Don't confuse
del statement with
del statement does not call
__del__ method declared in some classes. It just decreases ref count by 1 and that's all.
__del__ method also called
destructor sometimes gets executed when the reference count of that object reaches zero before removing it from memory.
Object's reference count can also decrease when it goes out of scope. See below example where the variable is created inside a function and when function call completes, it's reference count goes to 0 and it becomes inaccessible which can be seen as
NameError in the next cell.
When reference count of object goes to 0, it becomes accessible by Garbage Collector.
def test(): title = 'Hello' print(title) test()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-11-eee3914a8eb9> in <module>() ----> 1 title NameError: name 'title' is not defined
Objects defined in local scope like inside of functions, loops, classes, etc goes out of scope as soon as the function, the loop ends. But objects defined in global space exists until the program ends and does not become eligible to be collected by Garbage Collector because their reference count does not get 0. Hence think wisely before declaring complex objects in global space else you'll end up using memory even though you might not be using that global object always.
Garbage collection is an algorithm that automatically releases/frees memory from objects which are no longer in use.
Python uses 2 types of Garbage Collection algorithms:
Reference counting refers to counting references to an object and deleting that object once it's reference count reaches 0. There are a few pros and cons to this method. Pros:
Please find below the example for an explanation of cyclical references.
class Node(object): def __init__(self, value): self.value = value def next(self, next): self.next = next root = Node('Root') left = Node('Left') right = Node('Right') root.next(left) left.next(right) right.next(left)
In the above code, if you notice the root node has 1 reference count(variable named root), the left node has 3 reference count(variable named left, root and right) and the right node has 2 reference count (variable named right and left).
Here we have cycled as left is referring to right and right is referring to left.
Now if we try to remove references which are pointed by variable names like root, left and right then their reference count will go down by one which will result in collecting root node. But left and right won't be collected because they still have references to them internally. This way these objects become inaccessible and won't be collected by reference counting garbage collectors.
del root, left,right
Referencing counting alone is not able to collect objects with cyclical references.
Tracing garbage collection uses an algorithm called Mark and Sweep behind the scene.
It has 2 steps:
Mark- We take an object which is reachable and mark all objects which are reachable from that object.
Sweep- When marking is done then the sweep step will remove all other objects which are not reachable.
Generational Garbage collection is a type of Tracing Garbage collection.
To provide a generational garbage collection facility, Python generally keeps a list of every object created as the program is run. Also make a note that only
Container objects whose reference count is greater than 0 will be stored in this list because they keep a reference to other objects not simple objects like int, float, bool, etc. Container objects are responsible for creating cyclical reference issues.
It keeps 3 such lists and all program objects are kept in exactly one of generation list:
generation 0- Newly created objects are kept in this list.
generation 1- Objects moved to this generation when it gets a little old and still getting referenced.
generation 2- Objects which are oldest will be in this generation. It has objects which will be used as the program executes.
When a number of objects in a generation reach some threshold set then it runs garbage collection (Tracing - Mark and Sweep) on that generation and any generation younger than it. So if generation 2 reaches threshold then generation 1 and generation 0 will be swept as well. It makes a list of objects to discard and also detect cycles. If an object does not have outside references as we discussed above in Mark and Sweep, then they'll be put onto the discard list to be collected by the garbage collector. Then the garbage collection cycle will run and empty discard list.
Once the garbage collection cycle has completed discarding objects then whichever objects survived will be moved to one generation higher than the current one because they still have reference and will be used by the program again in the future. One can think that generation 0 will be more prone to reach it's threshold because most of the objects die young (have a little scope like loops, function calls, etc.) which won't be promoted to higher generations more.
When the reference count reaches 0 then you get immediate garbage collection, but when you have cyclical references then we need to wait for generational garbage collection to reach its threshold for generation in which object is residing.