Share @ Google LinkedIn Facebook  garbage-collection, memory-management, tracing
Memory Management in Python (Garbage Collection)

Overview

Memory management in python refers to allocation and deallocation of objects. It's handled by Python's Garbage Collection Algorithm. We'll try to understand basics of how memory management works in Python. Memory management in Python is done automatically by Interpreter and developer does not need to take any steps. Though Python still provides module called gc for explicitly calling Garbage Collector in some cases to tell it explicitly collect objects rather than waiting for automatic behaviour. Memory management also helps write better and efficient code to developer.

To start with memory management, we need to understand how variable works in Python and how objects are stored.

Python variables are different from other languages like C/C++/Java in that they are just references to Object. They do not have any data type attached to them. Hence you can assign any type of object to Python variable. Objects can have more than one references pointing to it. Reference count generally refers to number of references pointing to object. One can use id() function provided by Python to find out memory location of python variable.

Every Python ojbects maintains 3 things internally.

  • Type - Type of object
  • Value - Value of Object
  • RefCount - Reference Count

Python provides simple objects like number, string and container objects like list,dict, user-defined classes etc.

Lets define few variables and find out their memory location using id() function.

In [1]:
import sys
print(sys.version)
3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16)
[GCC 7.3.0]
In [2]:
x = 10
y = x
t = 10
print(id(x), id(y), id(t))
94008858372928 94008858372928 94008858372928

Notice above that integer object with value 10 is created and variables x,y,t are all pointing to it. Hence it's reference count is currently 3. Also notice that id() function returns same value for all 3 references.

In [3]:
z = 'hello'
l = 'hello'
k = l
i = 'Hello World'
j = 'Hello World'
print(id(z), id(l), id(k), id(i), id(j))
140707529744992 140707529744992 140707529744992 140706718953200 140706719614000

Notice above that first string is creating same id which means that both are pointing to same objects. Wheres in case of string with space in it is getting created as two separate objects. One can see it from id() output above. Lets create few container objects below.

In [4]:
lst = ['hello',10, 'Hello World']
print(id(lst[0]), id(lst[1]), id(lst[2]))
140707529744992 94008858372928 140706719613680

Note above that id of first and 2nd object of list is same as ids mentioned above for that objects. But for 3rd object it created new copy of that object. Now integer object with value 10 will have 4 references including 3 defined above and one mentioned in list. Same will be for string hello which will also have 4 reference count.

In [5]:
print(dir())
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'i', 'j', 'k', 'l', 'lst', 'quit', 'sys', 't', 'x', 'y', 'z']

How to decrease reference count of an object:

There are 3 ways to decrease reference count of an object:

  • Using del statement
  • Assigning None to a variable name.
  • Pointing reference variable to some other object.
In [6]:
del x
x
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-1beca76b84e9> in <module>()
      1 del x
----> 2 x

NameError: name 'x' is not defined
In [7]:
y, t, id(y), id(t)
Out[7]:
(10, 10, 94008858372928, 94008858372928)

Note above that eventhough we deleted reference x from memory. Integer object with value 10 is still accessible through y and t both. When we above pointed y to x, it pointed to integer object with value 10. Also note that id is still same.

When you call del on objects created from user-defined classes then it'll call their __del__() method if it's implemented in class.

Lets try to use None now.

In [8]:
y = None
y
In [9]:
t, lst[1], id(t), id(lst[1])
Out[9]:
(10, 10, 94008858372928, 94008858372928)

Please make a note that callind del statement on reference name or pointing reference to None does not actually delete object. It just remove that reference variable name as reference to that object which means that object will no longer be accessible through that reference.

Calling del statement or setting variable to None only decreases reference count of that object by 1. Pointing reference variable to some other object has same effect on reference count.

Note: Don't confuse del statement with __del__. del statement does not call __del__ method declared in some classes. It just decreases ref count by 1 and that's all. __del__ method also called destructor sometimes gets executed when reference count of that ojbect reaches zero before removing it from memory.

Object's reference count can also decrease when it goes out of scope. See below example where variable is created inside a function and when function call completes, it's reference count goes to 0 and it becomes in accesible which can be seen as NameError in next cell.

When reference count of object goes to 0, it becomes accessible by Garbage Collector.

In [10]:
def test():
    title = 'Hello'
    print(title)

test()
Hello
In [11]:
title
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-eee3914a8eb9> in <module>()
----> 1 title

NameError: name 'title' is not defined

Local vs Global Namespace

Objects defined in local scope like inside of functions, loops, classes etc goes out of scope as soon as function,loop ends. But objects defined in global space exists till prgram ends and does not become eligible to be collected by Garbage Collector because their reference count does not get 0. Hence think wisely before declaring complex objects in global space else you'll end up using memory eventhough you might not be using that global objects always.

Garbage Collection

Garbage collection is an algorithm which automatically releases/frees memory from objects which are no longer in use.

Python uses 2 types of Garbage Collection algorithms:

  • Reference Counting
  • Generational

Reference Counting

Reference counting referes to counting references to an object and deleting that object once it's reference count reaches 0. There are few pros and cons of this method. Pros:

  • Easy to implement
  • Immediate object deletion as soon as count reaches 0.
  • Cascading effect where object whose count goes zero might be pointing to other object and their count might go zero as well and results in deleting many objects.

Cons:

  • Memory overhead for maintaining ref counts.
  • Execution overhead as ref count changes on every assignments.
  • It's not thread-safe because if two threads tries to increase/decrease ref count of same object then we end up in unstable state.
  • It can not handle cyclical references.

Cyclical References with Example

Please find below example for explanation of cyclical references.

In [12]:
class Node(object):
    def __init__(self, value):
        self.value = value
    def next(self, next):
        self.next = next

root = Node('Root')
left = Node('Left')
right = Node('Right')
root.next(left)
left.next(right)
right.next(left)

In above code, if you notice root node has 1 reference count(variable named root), left node has 3 reference count(variable named left, root and right) and right node has 2 reference count (variable named right and left).

Here we have cycle as left is refering to right and right is referring to left.

Now if we try to remove reference which are pointed by variable names like root, left and ritght then their reference count will go down by one which will result in collecting root node. But left and right won't be collected because they still have references to them internally. This way these object become inaccessible and won't be collected by reference counting garbage collector.

In [13]:
del root, left,right

Referencing counting alone is not able to collect objects with cyclical references.

Tracing Garbage Collection

Tracing garbage collection uses an algorithm called Mark and Sweep behind the scene.

It has 2 steps:

  • Mark - We take object which is reachable and mark all objects which are reachable from that object.
  • Sweep - When marking is done then sweep step will remove all other objects which are not reachable.

Generational Garbage Collection

Generational Garbage collection is type of Tracing Garbage collection.

To provide generational garbage collection facility, Python generally keeps list of every objects created as program is run.Also make a note that only Container objects whose reference count is greater than 0 will be stored in this list because they keeps reference to other objects not simple objects like int, float,bool etc. Container objects are responsible for creating cyclical references issues.

It keeps 3 such lists and all program objects are kept in exactly one of genration list:

  • generation 0 - Newly created objects are kept in this list.
  • generation 1 - Objects moved to this generation when it gets little old and still getting referenced.
  • generation 2 - Objects which are oldest will be in this generation.It has objects which will be used as program executes.

When a number of objects in a generation reaches some threshold set then it runs garbage collection (Tracing - Mark and Sweep) on that generation and any generation younger than it. So if generation 2 reaches threshold then generation 1 and generation 0 will be swept as well. It makes a list of objects to discard and also detect cycles. If an object does not have outside references like we discussed above in Mark and Sweep, then they'll be put onti discard list to be collected by garbage collector. Then garbage collection cycle will run and empty discard list.

Once garbage collection cycle has completed discarding objects then whichever objects survived will be moved to one generation higher than current one because they still have reference and will be used by program again in future. One can think that generation 0 will be more prone to reach it's threshold because most of the objects die young (have little scope like loops, function calls etc.) which won't be promoted to higher generations more.

Conclusion

When reference count reaches 0 then you get immediate garbage collection, but when you have cyclical references then we need to wait for generational garbage collection to reach it's threshold for generation in which object is residing.

Let other learners know about this article @ Google LinkedIn Facebook
 Sunny Solanki