Memory management in python refers to allocation and deallocation of objects. It's handled by Python's Garbage Collection Algorithm. We'll try to understand basics of how memory management works in Python. Memory management in Python is done automatically by Interpreter and developer does not need to take any steps. Though Python still provides module called
gc for explicitly calling Garbage Collector in some cases to tell it explicitly collect objects rather than waiting for automatic behaviour. Memory management also helps write better and efficient code to developer.
Python variables are different from other languages like C/C++/Java in that they are just references to Object. They do not have any data type attached to them. Hence you can assign any type of object to Python variable. Objects can have more than one references pointing to it. Reference count generally refers to number of references pointing to object. One can use
id() function provided by Python to find out memory location of python variable.
Every Python ojbects maintains 3 things internally.
Type- Type of object
Value- Value of Object
RefCount- Reference Count
Python provides simple objects like number, string and container objects like list,dict, user-defined classes etc.
Lets define few variables and find out their memory location using
import sys print(sys.version)
3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0]
x = 10 y = x t = 10 print(id(x), id(y), id(t))
94008858372928 94008858372928 94008858372928
Notice above that integer object with value 10 is created and variables
x,y,t are all pointing to it. Hence it's reference count is currently 3. Also notice that
id() function returns same value for all 3 references.
z = 'hello' l = 'hello' k = l i = 'Hello World' j = 'Hello World' print(id(z), id(l), id(k), id(i), id(j))
140707529744992 140707529744992 140707529744992 140706718953200 140706719614000
Notice above that first string is creating same id which means that both are pointing to same objects. Wheres in case of string with space in it is getting created as two separate objects. One can see it from
id() output above.
Lets create few container objects below.
lst = ['hello',10, 'Hello World'] print(id(lst), id(lst), id(lst))
140707529744992 94008858372928 140706719613680
Note above that id of first and 2nd object of list is same as ids mentioned above for that objects. But for 3rd object it created new copy of that object. Now integer object with value 10 will have 4 references including 3 defined above and one mentioned in list. Same will be for string
hello which will also have 4 reference count.
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'i', 'j', 'k', 'l', 'lst', 'quit', 'sys', 't', 'x', 'y', 'z']
There are 3 ways to decrease reference count of an object:
Noneto a variable name.
del x x
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-1beca76b84e9> in <module>() 1 del x ----> 2 x NameError: name 'x' is not defined
y, t, id(y), id(t)
(10, 10, 94008858372928, 94008858372928)
Note above that eventhough we deleted reference x from memory. Integer object with value 10 is still accessible through
t both. When we above pointed
x, it pointed to integer object with value 10. Also note that
id is still same.
When you call
del on objects created from user-defined classes then it'll call their
__del__() method if it's implemented in class.
Lets try to use
y = None y
t, lst, id(t), id(lst)
(10, 10, 94008858372928, 94008858372928)
Please make a note that callind
del statement on reference name or pointing reference to
None does not actually delete object. It just remove that reference variable name as reference to that object which means that object will no longer be accessible through that reference.
del statement or setting variable to
None only decreases reference count of that object by 1. Pointing reference variable to some other object has same effect on reference count.
Note: Don't confuse
del statement with
del statement does not call
__del__ method declared in some classes. It just decreases ref count by 1 and that's all.
__del__ method also called
destructor sometimes gets executed when reference count of that ojbect reaches zero before removing it from memory.
Object's reference count can also decrease when it goes out of scope. See below example where variable is created inside a function and when function call completes, it's reference count goes to 0 and it becomes in accesible which can be seen as
NameError in next cell.
When reference count of object goes to 0, it becomes accessible by Garbage Collector.
def test(): title = 'Hello' print(title) test()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-11-eee3914a8eb9> in <module>() ----> 1 title NameError: name 'title' is not defined
Objects defined in local scope like inside of functions, loops, classes etc goes out of scope as soon as function,loop ends. But objects defined in global space exists till prgram ends and does not become eligible to be collected by Garbage Collector because their reference count does not get 0. Hence think wisely before declaring complex objects in global space else you'll end up using memory eventhough you might not be using that global objects always.
Garbage collection is an algorithm which automatically releases/frees memory from objects which are no longer in use.
Python uses 2 types of Garbage Collection algorithms:
Reference counting referes to counting references to an object and deleting that object once it's reference count reaches 0. There are few pros and cons of this method. Pros:
Please find below example for explanation of cyclical references.
class Node(object): def __init__(self, value): self.value = value def next(self, next): self.next = next root = Node('Root') left = Node('Left') right = Node('Right') root.next(left) left.next(right) right.next(left)
In above code, if you notice root node has 1 reference count(variable named root), left node has 3 reference count(variable named left, root and right) and right node has 2 reference count (variable named right and left).
Here we have cycle as left is refering to right and right is referring to left.
Now if we try to remove reference which are pointed by variable names like root, left and ritght then their reference count will go down by one which will result in collecting root node. But left and right won't be collected because they still have references to them internally. This way these object become inaccessible and won't be collected by reference counting garbage collector.
del root, left,right
Referencing counting alone is not able to collect objects with cyclical references.
Tracing garbage collection uses an algorithm called Mark and Sweep behind the scene.
It has 2 steps:
Mark- We take object which is reachable and mark all objects which are reachable from that object.
Sweep- When marking is done then sweep step will remove all other objects which are not reachable.
Generational Garbage collection is type of Tracing Garbage collection.
To provide generational garbage collection facility, Python generally keeps list of every objects created as program is run.Also make a note that only
Container objects whose reference count is greater than 0 will be stored in this list because they keeps reference to other objects not simple objects like int, float,bool etc. Container objects are responsible for creating cyclical references issues.
It keeps 3 such lists and all program objects are kept in exactly one of genration list:
generation 0- Newly created objects are kept in this list.
generation 1- Objects moved to this generation when it gets little old and still getting referenced.
generation 2- Objects which are oldest will be in this generation.It has objects which will be used as program executes.
When a number of objects in a generation reaches some threshold set then it runs garbage collection (Tracing - Mark and Sweep) on that generation and any generation younger than it. So if generation 2 reaches threshold then generation 1 and generation 0 will be swept as well. It makes a list of objects to discard and also detect cycles. If an object does not have outside references like we discussed above in Mark and Sweep, then they'll be put onti discard list to be collected by garbage collector. Then garbage collection cycle will run and empty discard list.
Once garbage collection cycle has completed discarding objects then whichever objects survived will be moved to one generation higher than current one because they still have reference and will be used by program again in future. One can think that generation 0 will be more prone to reach it's threshold because most of the objects die young (have little scope like loops, function calls etc.) which won't be promoted to higher generations more.
When reference count reaches 0 then you get immediate garbage collection, but when you have cyclical references then we need to wait for generational garbage collection to reach it's threshold for generation in which object is residing.