Programming Python

Python Gotchas No. 2: Garbage Collection Oddities

Python is a garbage collected language. The garbage collector will collect orphaned objects. These are objects that have no references.
If an object has a __del__ method, it will be called when that object is collected. Note however, that there are no guarantees as to when this will take place. This means that while you should release all owned resources in the __del__ method, you should not depend on it to release these resources when the object “goes out of scope” as in C++.
Specifically note that del x does not call x’s __del__ method, just removes this specific reference to x.
To be explicit about resource management, call the appropriate function yourself in the flow of your program, or better yet, use try-finally or the with statement. (For more information about those, see my presentation about advanced python subjects).

There is another interesting caveat regarding Python’s gc. Consider the following code:

class A(object):
a = A()
b = A()
a.x = b
b.x = a
del a
del b

Did a and b lose their references? Well, they didn’t. They’re still pointing at each other, creating a reference cycle. Happily, Python’s garbage collector can still handle those. However, it won’t be able to handle reference cycles if at least one object in the cycle has a __del__ method.
To understand why, consider the above example, only this time assume A has a __del__ method that calls self.x.release().

Now, which __del__ method should be called first? if it is called, is the other one still valid? Python refuses the temptation to guess, and leaves the cycle be, creating a memory (or resource) leak.
The solution?
1. Avoid data structures with reference cycles. For instance, there are very few instances where you’d need a doubly linked list :)
2. If you do need reference cycles, consider using the weakref module, which allows you to create references that “don’t count” in the eyes of the gc.

Further reading material:
1. The __del__ method.
2. The gc (garbage collector) module.

Programming Python Utility Functions

Python Tidbits

Memory Leaks

With the advent of the Python’s garbage collector, it would seem that Python programs should not leak memory. However, this is not always the case. With just a bit of circular references, you can find yourself leaking quite a bit of memory.

A simple example is a bidirectional data structure, like a 2-way linked list, or a tree where the nodes have a reference to their parents. To really free those, you have to break the cycle manually. Another option is to use a weakref for some of the references. (Usually, I like to use weakref.proxy for the parent links.)

An easy way to test code for memory leaks is to run it in a loop (allocate, do job, deallocate, repeat), and call gc.collect() before and after. If the value that gc.collect() returns after the loop grows with the size of the loop, you might have a memory leak.

I found out about a special case of a potential memory leak just the other day.
Let’s say you want to implement some switch code. A common idiom is to write it like this:

def a(x):
    print 'a', x
def b(x):
    print 'b', x
func = {0: a, 1: b}[x]

This works well enough. However, if you do the same thing in a class instance, things start to get messy:

class A(object):
    def __init__(self):
        self.lookup = {0: self.a, 1: self.b}
    def a(self, x):
        print 'a', x
    def b(self, x):
        print 'b', x
    def do_something(self):
        func = self.lookup[x]

In this simple example, __init__ creates a reference cycle. This is because self.a creates a method object on the spot, with a reference to self.
So, if z is an instance of A, our cycle is: z->lookup->z.a->z.

Take heed.
(Note: Some cycles are collectible with gc.collect(). The simple cycle described here is such a case. It may get ugly though.)

While writing the VM for vial, we needed some way to avoid testing assembly code for undefined values.

For example, after executing a div instruction, CF is undefined. We handle this in the templates by writing CF=UNKNOWN(), and we allow to VM to decide on a value. If you need the VM to just execute code, returning 0 for undefined values is fine. If, however, you are testing code, your actual CPU might return a non-zero value, and your test will fail with no real reason.

To solve this problem, our VM may be told to return a special singleton object for undefined values:

class _UndefinedNumber(object):
    def __eq__(self, other):
        if other is self:
            return True
        return False
    def __ne__(self, other):
        return not self == other
    def binary_action(self, other):
        return self
    def unary_action(self):
        return self
    def __str__(self):
        return "Undefined"
    __repr__ = __str__
    (__add__, __sub__, __mul__,
     __div__, __floordiv__, __mod__,
     __pow__, __and__, __xor__,
     __or__, __lshift__, __rshift__,
     __rlshift__, __rrshift__, __radd__,
     __rsub__, __rmul__, __rdiv__,
     __rfloordiv__, __rmod__, __rpow__,
     __rand__, __rxor__, __ror__) = 24*[binary_action]
    (__neg__, __pos__,
     __abs__, __invert__) = 4*[unary_action]
Undefined = _UndefinedNumber()

Now, we test only the flags that are not Undefined. Note that any operation on an Undefined returns an Undefined. This is done to allow registers to be Undefined as well. This way, if AX gets Undefined, so will AH and EAX.

Here is an example:

In [2]: Undefined + 1
Out[2]: Undefined
In [3]: Undefined * 2
Out[3]: Undefined
In [4]: Undefined / Undefined
Out[4]: Undefined

Note that this constant might behave a bit strangely. For example, Undefined * 0 is Undefined, and Undefined – Undefined is also Undefined. Because of that, I’m not sure using this object in production circumstances is a good idea, so in the meantime, this is just a testing utility.