Categories
Challenges Design Programming Python

LRU cache solution: a case for linked lists in Python

The reason I put up the LRU cache challenge up, was that I couldn’t think of a good solution to the problem without using linked lists. This has been pointed to by Adam and Erez as well. Adam commented on this, and Erez’ solution to the problem was algorithmically identical to mine.

So how to solve the challenge? Here are the two possible solutions I thought about:

  • Use a dict for lookup, and each element’s age is indicated by its position in a linked list. This is the solution Erez and I implemented.
  • Keep a ‘last time of use’ indicator for each element. This could be just a regular int, incremented by 1 for each lookup. Keep the elements in a min heap, and when there are too many elements, pop them using the minimum heap.

Generally, I consider the first solution more elegant. It doesn’t rely on an integer to work, so it could work ‘indefinitely’. Of course, the second solution can be also made to work indefinitely, with some upkeep from time to time. (The added time cost of the upkeep may be amortized over other actions.)

If you can think of some other, more elegant solution, I’ll be happy to hear about it.

So, given that a linked list solution is more elegant, we come to the crux of the problem: what to do in Python? The Python standard library does not contain a linked list implementation as far as I know. As a result, Python programmers are encouraged to use the list type, which is an array. This is just as well: for most intents and purposes, the list type is good enough.

I tried to think a little about other cases where a linked list was more appropriate, and I didn’t come up with any more such cases. If you come up with any such case, I’ll be happy to hear about it.

After looking for a public implementation, and not finding one that seemed good enough, I decided to go ahead and write my own.

Out of curiousity, I also did a small comparison of runtime speeds between my implementation of a linked list, and the list data type. I tried a test where a linked list has an obvious advantage (complexity wise)- removing elements from the middle of the list. The Python list won up to somewhere in the thousands of elements. (Of course, list is implemented in C, and mine is in pure Python).

What is my conclusion from all of this? The same as the conventional wisdom: use the list data type almost always. If you find yourself in need of a linked list, think long and hard (well, not too long) about your problem and solution. There’s a good chance that either you can use the built-in list with an equivalent solution, or that a regular list will still be faster, for most cases. Of course, if you see no other way – do what you think is best.

Your thoughts?

Categories
Algorithms Challenges computer science Programming Python Utility Functions

Small Python Challenge No. 2 – LRU Cache

Caching is easy. Consider the cache I used to optimize the recursive spring:

class _NotInDict(object):
    pass
_NotInDict = _NotInDict()
def cached(func):
    cache = {}
    def wrapper_func(*args):
        prev_result = cache.get(args, _NotInDict)
        if prev_result is _NotInDict:
            result = func(*args)
            cache[args] = result
            return result
        return prev_result
    return wrapper_func

This kind of cache is simple and effective (especially for recursions), and may be used in all sorts of situations. However, sometimes you want a size limited cache. In that case you have to decide on the criterion used to decide which items to throw away. There are many kinds of criteria used, for further reading check out wikipedia.

For now I’d like to discuss the LRU cache though. LRU stands for Least Recently Used, which means that you throw away the items you didn’t use for a long time. Time in this case is measured by actions. I thought of this type of cache when I worked on the recursive spring. Since each step in the ‘recursivation’ used two samples of the previous step, caching was an obvious choice, and if I had to size limit my cache, LRU whould be the type of cache to use, as you could be certain that the older samples would not have to be used until the next drawing.

The challenge for the weekend is to write an LRU cache in Python. The cache has to be general – support hash-able keys and any cache size required. It has to be efficient – in the size of the cache and the time it takes for a lookup and an update. Once the standard requirements have been met, the big competition should be on elegance. I wrote a benchmark implementation, which is efficient, but not fast. Once I see some solutions, I’ll talk a bit about mine, which is an interesting case-study.