Share @ LinkedIn Facebook  garbage-collection, memory-management, tracing
Memory Management in Python (Garbage Collection)

Overview

Memory management in python refers to the allocation and deallocation of objects. It's handled by Python's Garbage Collection Algorithm. We'll try to understand the basics of how memory management works in Python. Memory management in Python is done automatically by the Interpreter and the developer does not need to take any steps. Though Python still provides a module called gc for explicitly calling Garbage Collector in some cases to tell it explicitly collect objects rather than waiting for automatic behavior. Memory management also helps write better and efficient code to developers.

To start with memory management, we need to understand how variable works in Python and how objects are stored.

Python variables are different from other languages like C/C++/Java in that they are just references to Object. They do not have any data type attached to them. Hence you can assign any type of object to Python variable. Objects can have more than one reference pointing to it. Reference count generally refers to a number of references pointing to object. One can use id() function provided by Python to find out the memory location of the python variable.

Every Python ojbects maintains 3 things internally.

  • Type - Type of object
  • Value - Value of Object
  • RefCount - Reference Count

Python provides simple objects like a number, string and container objects like list,dict, user-defined classes, etc.

Let’s define a few variables and find out their memory location using id() function.

In [1]:
import sys
print(sys.version)
3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16)
[GCC 7.3.0]
In [2]:
x = 10
y = x
t = 10
print(id(x), id(y), id(t))
94008858372928 94008858372928 94008858372928

Notice above that integer object with value 10 is created and variables x,y,t are all pointing to it. Hence it's reference count is currently 3. Also notice that id() function returns same value for all 3 references.

In [3]:
z = 'hello'
l = 'hello'
k = l
i = 'Hello World'
j = 'Hello World'
print(id(z), id(l), id(k), id(i), id(j))
140707529744992 140707529744992 140707529744992 140706718953200 140706719614000

Notice above that first string is creating the same id which means that both are pointing to the same objects. Wheres in case of string with space in it is getting created as two separate objects. One can see it from id() output above.

Let’s create a few container objects below.

In [4]:
lst = ['hello',10, 'Hello World']
print(id(lst[0]), id(lst[1]), id(lst[2]))
140707529744992 94008858372928 140706719613680

Note above that id of first and 2nd object of the list is the same as ids mentioned above for those objects. But for 3rd object, it created a new copy of that object. Now integer object with value 10 will have 4 references including 3 defined above and one mentioned in the list. The same will be for string hello which will also have 4 reference count.

In [5]:
print(dir())
['In', 'Out', '_', '__', '___', '__builtin__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_dh', '_i', '_i1', '_i2', '_i3', '_i4', '_i5', '_ih', '_ii', '_iii', '_oh', 'exit', 'get_ipython', 'i', 'j', 'k', 'l', 'lst', 'quit', 'sys', 't', 'x', 'y', 'z']

How to decrease the reference count of an object:

There are 3 ways to decrease the reference count of an object:

  • Using del statement
  • Assigning None to a variable name.
  • Pointing reference variable to some other object.
In [6]:
del x
x
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-1beca76b84e9> in <module>()
      1 del x
----> 2 x

NameError: name 'x' is not defined
In [7]:
y, t, id(y), id(t)
Out[7]:
(10, 10, 94008858372928, 94008858372928)

Note above that even though we deleted reference x from memory. Integer object with value 10 is still accessible through y and t both. When we above pointed y to x, it pointed to integer object with value 10. Also, note that id is still same.

When you call del on objects created from user-defined classes then it'll call their __del__() method if it's implemented in class.

Let’s try to use None now.

In [8]:
y = None
y
In [9]:
t, lst[1], id(t), id(lst[1])
Out[9]:
(10, 10, 94008858372928, 94008858372928)

Please make a note that calling del statement on reference name or pointing reference to None does not actually delete the object. It just removes that reference variable name as a reference to that object which means that object will no longer be accessible through that reference.

Calling del statement or setting variable to None only decreases the reference count of that object by 1. The pointing reference variable to some other object has the same effect on reference count.

Note: Don't confuse del statement with __del__. del statement does not call __del__ method declared in some classes. It just decreases ref count by 1 and that's all. __del__ method also called destructor sometimes gets executed when the reference count of that object reaches zero before removing it from memory.

Object's reference count can also decrease when it goes out of scope. See below example where the variable is created inside a function and when function call completes, it's reference count goes to 0 and it becomes inaccessible which can be seen as NameError in the next cell.

When reference count of object goes to 0, it becomes accessible by Garbage Collector.

In [10]:
def test():
    title = 'Hello'
    print(title)

test()
Hello
In [11]:
title
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-eee3914a8eb9> in <module>()
----> 1 title

NameError: name 'title' is not defined

Local vs Global Namespace

Objects defined in local scope like inside of functions, loops, classes, etc goes out of scope as soon as the function, the loop ends. But objects defined in global space exists until the program ends and does not become eligible to be collected by Garbage Collector because their reference count does not get 0. Hence think wisely before declaring complex objects in global space else you'll end up using memory even though you might not be using that global object always.

Garbage Collection

Garbage collection is an algorithm that automatically releases/frees memory from objects which are no longer in use.

Python uses 2 types of Garbage Collection algorithms:

  • Reference Counting
  • Generational

Reference Counting

Reference counting refers to counting references to an object and deleting that object once it's reference count reaches 0. There are a few pros and cons to this method. Pros:

  • Easy to implement
  • Immediate object deletion as soon as count reaches 0.
  • Cascading effect where object whose count goes zero might be pointing to other objects and their count might go zero as well and results in deleting many objects.

Cons:

  • Memory overhead for maintaining ref counts.
  • Execution overhead as ref count changes on every assignment.
  • It's not thread-safe because if two threads try to increase/decrease ref count of the same object then we end up in the unstable state.
  • It can not handle cyclical references.

Cyclical References with Example

Please find below the example for an explanation of cyclical references.

In [12]:
class Node(object):
    def __init__(self, value):
        self.value = value
    def next(self, next):
        self.next = next

root = Node('Root')
left = Node('Left')
right = Node('Right')
root.next(left)
left.next(right)
right.next(left)

In the above code, if you notice the root node has 1 reference count(variable named root), the left node has 3 reference count(variable named left, root and right) and the right node has 2 reference count (variable named right and left).

Here we have cycled as left is referring to right and right is referring to left.

Now if we try to remove references which are pointed by variable names like root, left and right then their reference count will go down by one which will result in collecting root node. But left and right won't be collected because they still have references to them internally. This way these objects become inaccessible and won't be collected by reference counting garbage collectors.

In [13]:
del root, left,right

Referencing counting alone is not able to collect objects with cyclical references.

Tracing Garbage Collection

Tracing garbage collection uses an algorithm called Mark and Sweep behind the scene.

It has 2 steps:

  • Mark - We take an object which is reachable and mark all objects which are reachable from that object.
  • Sweep - When marking is done then the sweep step will remove all other objects which are not reachable.

Generational Garbage Collection

Generational Garbage collection is a type of Tracing Garbage collection.

To provide a generational garbage collection facility, Python generally keeps a list of every object created as the program is run. Also make a note that only Container objects whose reference count is greater than 0 will be stored in this list because they keep a reference to other objects not simple objects like int, float, bool, etc. Container objects are responsible for creating cyclical reference issues.

It keeps 3 such lists and all program objects are kept in exactly one of generation list:

  • generation 0 - Newly created objects are kept in this list.
  • generation 1 - Objects moved to this generation when it gets a little old and still getting referenced.
  • generation 2 - Objects which are oldest will be in this generation. It has objects which will be used as the program executes.

When a number of objects in a generation reach some threshold set then it runs garbage collection (Tracing - Mark and Sweep) on that generation and any generation younger than it. So if generation 2 reaches threshold then generation 1 and generation 0 will be swept as well. It makes a list of objects to discard and also detect cycles. If an object does not have outside references as we discussed above in Mark and Sweep, then they'll be put onto the discard list to be collected by the garbage collector. Then the garbage collection cycle will run and empty discard list.

Once the garbage collection cycle has completed discarding objects then whichever objects survived will be moved to one generation higher than the current one because they still have reference and will be used by the program again in the future. One can think that generation 0 will be more prone to reach it's threshold because most of the objects die young (have a little scope like loops, function calls, etc.) which won't be promoted to higher generations more.

Conclusion

When the reference count reaches 0 then you get immediate garbage collection, but when you have cyclical references then we need to wait for generational garbage collection to reach its threshold for generation in which object is residing.

 Sunny Solanki
Other Python Blogs