The reason I put up the LRU cache challenge up, was that I couldn’t think of a good solution to the problem without using linked lists. This has been pointed to by Adam and Erez as well. Adam commented on this, and Erez’ solution to the problem was algorithmically identical to mine.
So how to solve the challenge? Here are the two possible solutions I thought about:
- Use a dict for lookup, and each element’s age is indicated by its position in a linked list. This is the solution Erez and I implemented.
- Keep a ‘last time of use’ indicator for each element. This could be just a regular int, incremented by 1 for each lookup. Keep the elements in a min heap, and when there are too many elements, pop them using the minimum heap.
Generally, I consider the first solution more elegant. It doesn’t rely on an integer to work, so it could work ‘indefinitely’. Of course, the second solution can be also made to work indefinitely, with some upkeep from time to time. (The added time cost of the upkeep may be amortized over other actions.)
If you can think of some other, more elegant solution, I’ll be happy to hear about it.
So, given that a linked list solution is more elegant, we come to the crux of the problem: what to do in Python? The Python standard library does not contain a linked list implementation as far as I know. As a result, Python programmers are encouraged to use the list type, which is an array. This is just as well: for most intents and purposes, the list type is good enough.
I tried to think a little about other cases where a linked list was more appropriate, and I didn’t come up with any more such cases. If you come up with any such case, I’ll be happy to hear about it.
After looking for a public implementation, and not finding one that seemed good enough, I decided to go ahead and write my own.
Out of curiousity, I also did a small comparison of runtime speeds between my implementation of a linked list, and the list data type. I tried a test where a linked list has an obvious advantage (complexity wise)- removing elements from the middle of the list. The Python list won up to somewhere in the thousands of elements. (Of course, list is implemented in C, and mine is in pure Python).
What is my conclusion from all of this? The same as the conventional wisdom: use the list data type almost always. If you find yourself in need of a linked list, think long and hard (well, not too long) about your problem and solution. There’s a good chance that either you can use the built-in list with an equivalent solution, or that a regular list will still be faster, for most cases. Of course, if you see no other way – do what you think is best.
Your thoughts?
I tried your LRU implementation but after a while, I get:
File “/home/digulla/packages/scanner/lru.py”, line 145, in get
self.age_list.remove(link)
File “/home/digulla/packages/scanner/lru.py”, line 110, in remove
old_prev.next = old_next
ReferenceError: weakly-referenced object no longer exists
I’m not 100% sure if this is a bug in your code or mine; I’m using the LRUMap in two threads but I’ve added locks in get() and put(). That didn’t help.
When I remove the weakrefs, it works. So I’m wondering: why are you using proxy() in the linked list implementation?
The Python list type is not an array in the sense that you are probably thinking of. It is a complex data type optimized for typical use cases.
I finally uploaded the fixed version. Thanks for the fix Aaron!