Archive for the ‘Algorithms’ Category

Small Python Challenge No. 2 – LRU Cache

Friday, December 28th, 2007

Caching is easy. Consider the cache I used to optimize the recursive spring:

class _NotInDict(object):
    pass
_NotInDict = _NotInDict()
def cached(func):
    cache = {}
    def wrapper_func(*args):
        prev_result = cache.get(args, _NotInDict)
        if prev_result is _NotInDict:
            result = func(*args)
            cache[args] = result
            return result
        return prev_result
    return wrapper_func

This kind of cache is simple and effective (especially for recursions), and may be used in all sorts of situations. However, sometimes you want a size limited cache. In that case you have to decide on the criterion used to decide which items to throw away. There are many kinds of criteria used, for further reading check out wikipedia.

For now I'd like to discuss the LRU cache though. LRU stands for Least Recently Used, which means that you throw away the items you didn't use for a long time. Time in this case is measured by actions. I thought of this type of cache when I worked on the recursive spring. Since each step in the 'recursivation' used two samples of the previous step, caching was an obvious choice, and if I had to size limit my cache, LRU whould be the type of cache to use, as you could be certain that the older samples would not have to be used until the next drawing.

The challenge for the weekend is to write an LRU cache in Python. The cache has to be general - support hash-able keys and any cache size required. It has to be efficient - in the size of the cache and the time it takes for a lookup and an update. Once the standard requirements have been met, the big competition should be on elegance. I wrote a benchmark implementation, which is efficient, but not fast. Once I see some solutions, I'll talk a bit about mine, which is an interesting case-study.

Optimizing the recursive spring

Thursday, December 27th, 2007

Remember the recursive spring? Well, when I worked on it, it was quite slow. I mean very slow. Especially at the higher levels.

The original reason was the recursion. Consider this - to 'springify' a track, you had to sample it twice, so that you could get the heading. Here is an illustration:

(lower track is 'inner' track)

Now it is clear - the number of recursive calls made on all levels grow exponentially. There is some hope. The recursive calls use points calculated previously. It seems that with good caching we'll be able to cancel out the exponential growth, and make it linear.

Actually, good caching did much more than that - I cached the last track as well. Which meant that the second time around drawing the track didn't cost anything - just some dictionary lookups. This yielded a pretty elegant result.

To those curious, here is the code. Before I started working on it I hacked together a small 3d explorer with PyOpenGL, so that I'll be able to play with the 3d models. This module is useful by itself as well, and someday soon I'll dedicate a post to it. (Only after making it better. Right now I don't consider it high-quality code.)

To run it, use tracks.py, and play with it a little. If you look at the code, there are many parameters to play with, yielding interesting results.

Elegant Line Clipping Algorithm

Monday, November 26th, 2007

I actually wrote about that a long time ago, in the wiki that once was algorithm.co.il. Since the wiki is long gone, and I figured this is a very elegant algorithm, I decided I should write about it.

First, some background - while writing Ascii-Plotter, I had to plot lines. To plot lines correctly I had to clip them. Now since this was a fun project, I decided I wanted to figure out how to do that myself, even though the problem is really solved. So I sat down with pen and paper, and did a little thinking, and I came up this nice algorithm.

The idea is simply this: consider a line, that is not parallel to any of the axes. This line intersects all the four lines that make up the limits of our screen. Here is a diagram:

An intersection of a line with a bounding box

Now notice that there are six points of interest: the four intersection points, and the two endpoints of the line. For each point, we find its parameter in the parametric representation of the line, where a and b are the endpoints:

x(t) = a+t(b-a)

To find out the actual endpoints of the clipped line, we sort the six points according to their parameter, and take the two innermost points (with zero based indexing, those would be 2 and 3). What if the line is outside the box you say? We check that the result points' parameters are between 0 and 1 (which correspond to the original endpoints).

Note that for a line parallel to one of the axes, a similar method applies, but without two of the intersection points. However, it is probably easier to just calculate the clipping directly in that case.

Here is some sample code, without all the surrounding checks (for parallelism with the axes, trivial clipping etc...).
The full version is available in the code of Ascii-Plotter.

points_t = [0.0,1.0]

points_t.append( float(rect_bottom_left[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_top_right[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_bottom_left[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )
points_t.append( float(rect_top_right[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )

points_t.sort()
if points_t[2] <0 or points_t[2]>= 1 or points_t[3] <0 or points_t[2]>= 1:
    return None
result = [(pt_1 + t*(pt_2-pt_1)) for t in (points_t[2],points_t[3]) for (pt_1, pt_2) in zip(line_pt_1, line_pt_2)]
return (result[0],result[1]), (result[2], result[3])

Pagerank recommends movies

Saturday, September 22nd, 2007

So how was your Yom Kippur? After Yom Kippur ended, I sat down to write something that was nagging me for quite some time. I wanted to see what pagerank had to say about movies. I've always liked to picture things as graphs, and looking at movies and actors as nodes in a graph is very intuitive. I also think that the idea has some merit. I like most of the movies that Edward Norton appears in, so it makes sense to give him a high score, and see which other movies Edward Norton "recommends".

So I fired up my IDE, and started to work.

Well, at first I was a little bit disappointed, because IMDbPY doesn't have code to read the IMDB charts. Well, I was going to extend it a little bit, but then I remembered that I already downloaded all the data I needed some time ago (more than half a year I believe), and that was good enough. So I opened up some old pickle file I had lying around, and Voila! after some playing around with the format, I had a weighted graph at my disposal. I weighted the graph according to the IMDB listing - if an actor appeared higher on the cast, his (or hers) weight to and from that movie will be higher.

I implemented a quick and dirty pagerank, and then added some code to score my recommendations higher. I did that by adding each iteration my recommendation's weight to the movie's score.

Without recommendations, the top spots were:

  • 24
  • The Sopranos
  • The Wizard of Oz
  • JFK
  • Cidade de Deus
  • Citizen Kane
  • King Kong
  • Voyna i mir
  • Dolce vita, La
  • The Patriot

After adding "Edward Norton" and "Tom Hanks" as recommendations, I got the following results:

  • American History X
  • Fight Club
  • 24
  • Red Dragon
  • The Sopranos
  • Catch Me If You Can
  • The Wizard of Oz
  • JFK
  • Forrest Gump
  • Cidade de Deus

Well, I was surprised a little to see 24 and the Sopranos rank so high, and also a little bit disappointed. The recommendations didn't really add enough information, and more work will have to be done for this to work properly. It has the right 'smell' however. It looks like it has potential to be useful - I can just see the program, having a small table with a 'seen/unseen' checkbox, and a place to write your favorites, and the program will tell which movies you should watch next. This reminds me a little bit of the recommendations on soulseek and Amarok. It would be really fun to write a total recommendation engine, with the best of all worlds. Someone probably already implemented it.

In any case, I'll work on it some more. This is interesting.

If anyone wants the code, tell me, and I'll make it 'online worthy'...

How PyTuner works

Wednesday, August 1st, 2007

PyTuner is really quite simple. Here is the outline of the algorithm behind it:

  1. Record audio data with pymedia.
  2. Compute the FFT of the audio data - now we have the strength of each frequency in the sample.
  3. Find peaks in the result of the FFT. This means looking for the dominant frequencies.
  4. Find the lowest frequency that has a peak besides the zero frequency.
  5. Find the nearest 'desired note' (the notes we should tune to) to that frequency.
  6. Mark the difference between the actual frequency and the desired note frequency.

There is more to the algorithm. First, to find peaks, I used a simple algorithm - I computed the average strength of the frequencies in the sample, and looked at all the frequencies above two standard deviations above that average. This yields an array, where each frequency is marked by 0, or 1. (Instead of 1 you can mark it with its original strength). For each consecutive sequence of frequencies with 1 - I computed the average of the run, and that was the peak location. So for example, if the data was:
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
The peak would be in the frequency matching index 5.
(If you used the original strength instead of 0 and 1, you can apply a weighted average to get a more accurate result)

To find the actual note of a given frequency, I used the fact that notes (half-tones) are separated by a ratio of 2^(1/12) . (the twelfth root of 2). So, if our base note is A, at 110 Hz, to compute the note we would use the following code snippet:

note_names = "A A# B C C# D D# E F F# G G#".split()
note_index = math.log(note_freq/base_freq, 2**(1.0/12))
note_name = note_names[note_index%12]

While writing PyTuner, at first I implemented the FFT on my own, just to know how to do it once. I found it helped me to better understand the nature of the algorithm, and its output. I find the practice of implementing various algorithms on your own, for educational purposes, quite healthy. I always like to know how things work 'behind the scenes'.

Riddle of the Week – LStrings

Friday, June 15th, 2007

So I wrote about lstrings. I intend to write about them again in a short while - I already finished the basic script a few days ago, but I'm waiting, until I will be satisfied with it.

In the meantime, here is a curious riddle (That I came up with):

Version 1 (not final)

Assume you are given some iteration of an lstring - S. How do you discover the original lstring used to create S?

Discussion

Well, those who are quick to answer will first try to discover the length of the original string - it could be computed directly from S. At this point you should come to the conclusion that without being given the exact number of iterations - the riddle is not that interesting... Iterating on an lstring is an associative operation. If we denote the substitution of a string A into a string B as multiplication, when both are iterations on an original string T , we will get:

A*B =T^n * T^k = T^(n+k) = T^k* T^n

This will result in the obvious answer to the original riddle - "Why, the lstring S is the original string, with zero iterations!". This answer is correct - and obviously useless :)

So, let us rephrase the riddle:

Version 2 (final)

Assume you are given some iteration of an lstring - S. How do you discover the minimum possible lstring that could be used to create S?

Discussion - But not a solution

This is the proper riddle - have at it! I will be glad to read your thoughts...

note: The acceptable solution should be either an algorithm, or a constructive mathematical proof. I don't like reading only statements of existence :)

Reachability on arbitrary maps

Sunday, May 6th, 2007

The other day I had an idea. What if you took a map of some country, and from each coordinate computed the time it took to any other coordinate. The basic assumptions are that you can drive on road by car from parking lot to parking lot, and from parking lot to another point you can get only by foot. You can also add trains, and airplanes, each having less routes, but are actually faster. So for example, traveling from city A's center to city B's center is quite fast, but traveling from city A's center to some point near the road from A to B will take some time.

This is a bit similar to computations done by game AI's to determine how to get from point A to point B on a game board.

Now let us assume assume we want to draw the resulting graph on some kind of a 3D map, where the distance of each point from other points is determined by the time it takes to reach from that point to the others. Actually, it might not always be possible to plot this graph on a 3D map as a friend pointed to me, as 3 neighboring points already lock our point in space. But let us say that you can plot this 3D map. A's center and B's center will be near each other, a bit like a paper folding - which is incidentally the most common depiction of 'science fiction wormholes' explanations to laymen. It was pretty nice to try and visualize this kind of map. It is also interesting to think what is the least number of dimensions required to plot a given map.

Although I couldn't come up with ideas about how this kind of visualization might be useful, it was still fun to to think of.