Categories

## Optimizing the recursive spring

Remember the recursive spring? Well, when I worked on it, it was quite slow. I mean very slow. Especially at the higher levels.

The original reason was the recursion. Consider this – to ‘springify’ a track, you had to sample it twice, so that you could get the heading. Here is an illustration:

(lower track is ‘inner’ track)

Now it is clear – the number of recursive calls made on all levels grow exponentially. There is some hope. The recursive calls use points calculated previously. It seems that with good caching we’ll be able to cancel out the exponential growth, and make it linear.

Actually, good caching did much more than that – I cached the last track as well. Which meant that the second time around drawing the track didn’t cost anything – just some dictionary lookups. This yielded a pretty elegant result.

To those curious, here is the code. Before I started working on it I hacked together a small 3d explorer with PyOpenGL, so that I’ll be able to play with the 3d models. This module is useful by itself as well, and someday soon I’ll dedicate a post to it. (Only after making it better. Right now I don’t consider it high-quality code.)

To run it, use tracks.py, and play with it a little. If you look at the code, there are many parameters to play with, yielding interesting results.

Categories

## Elegant Line Clipping Algorithm

I actually wrote about that a long time ago, in the wiki that once was algorithm.co.il. Since the wiki is long gone, and I figured this is a very elegant algorithm, I decided I should write about it.

First, some background – while writing Ascii-Plotter, I had to plot lines. To plot lines correctly I had to clip them. Now since this was a fun project, I decided I wanted to figure out how to do that myself, even though the problem is really solved. So I sat down with pen and paper, and did a little thinking, and I came up this nice algorithm.

The idea is simply this: consider a line, that is not parallel to any of the axes. This line intersects all the four lines that make up the limits of our screen. Here is a diagram:

Now notice that there are six points of interest: the four intersection points, and the two endpoints of the line. For each point, we find its parameter in the parametric representation of the line, where a and b are the endpoints:

x(t) = a+t(b-a)

To find out the actual endpoints of the clipped line, we sort the six points according to their parameter, and take the two innermost points (with zero based indexing, those would be 2 and 3). What if the line is outside the box you say? We check that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).

Note that for a line parallel to one of the axes, a similar method applies, but without two of the intersection points. However, it is probably easier to just calculate the clipping directly in that case.

Here is some sample code, without all the surrounding checks (for parallelism with the axes, trivial clipping etc…).
The full version is available in the code of Ascii-Plotter.

```points_t = [0.0,1.0]   points_t.append( float(rect_bottom_left[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) ) points_t.append( float(rect_top_right[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) ) points_t.append( float(rect_bottom_left[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) ) points_t.append( float(rect_top_right[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )   points_t.sort() if points_t[2] < 0 or points_t[2] >= 1 or points_t[3] < 0 or points_t[2]>= 1: return None result = [(pt_1 + t*(pt_2-pt_1)) for t in (points_t[2],points_t[3]) for (pt_1, pt_2) in zip(line_pt_1, line_pt_2)] return (result[0],result[1]), (result[2], result[3])```
Categories

## Pagerank recommends movies

So how was your Yom Kippur? After Yom Kippur ended, I sat down to write something that was nagging me for quite some time. I wanted to see what pagerank had to say about movies. I’ve always liked to picture things as graphs, and looking at movies and actors as nodes in a graph is very intuitive. I also think that the idea has some merit. I like most of the movies that Edward Norton appears in, so it makes sense to give him a high score, and see which other movies Edward Norton “recommends”.

So I fired up my IDE, and started to work.

Well, at first I was a little bit disappointed, because IMDbPY doesn’t have code to read the IMDB charts. Well, I was going to extend it a little bit, but then I remembered that I already downloaded all the data I needed some time ago (more than half a year I believe), and that was good enough. So I opened up some old pickle file I had lying around, and Voila! after some playing around with the format, I had a weighted graph at my disposal. I weighted the graph according to the IMDB listing – if an actor appeared higher on the cast, his (or hers) weight to and from that movie will be higher.

I implemented a quick and dirty pagerank, and then added some code to score my recommendations higher. I did that by adding each iteration my recommendation’s weight to the movie’s score.

Without recommendations, the top spots were:

• 24
• The Sopranos
• The Wizard of Oz
• JFK
• Cidade de Deus
• Citizen Kane
• King Kong
• Voyna i mir
• Dolce vita, La
• The Patriot

After adding “Edward Norton” and “Tom Hanks” as recommendations, I got the following results:

• American History X
• Fight Club
• 24
• Red Dragon
• The Sopranos
• Catch Me If You Can
• The Wizard of Oz
• JFK
• Forrest Gump
• Cidade de Deus

Well, I was surprised a little to see 24 and the Sopranos rank so high, and also a little bit disappointed. The recommendations didn’t really add enough information, and more work will have to be done for this to work properly. It has the right ‘smell’ however. It looks like it has potential to be useful – I can just see the program, having a small table with a ‘seen/unseen’ checkbox, and a place to write your favorites, and the program will tell which movies you should watch next. This reminds me a little bit of the recommendations on soulseek and Amarok. It would be really fun to write a total recommendation engine, with the best of all worlds. Someone probably already implemented it.

In any case, I’ll work on it some more. This is interesting.

If anyone wants the code, tell me, and I’ll make it ‘online worthy’…

Categories

## How PyTuner works

PyTuner is really quite simple. Here is the outline of the algorithm behind it:

1. Record audio data with pymedia.
2. Compute the FFT of the audio data – now we have the strength of each frequency in the sample.
3. Find peaks in the result of the FFT. This means looking for the dominant frequencies.
4. Find the lowest frequency that has a peak besides the zero frequency.
5. Find the nearest ‘desired note’ (the notes we should tune to) to that frequency.
6. Mark the difference between the actual frequency and the desired note frequency.

There is more to the algorithm. First, to find peaks, I used a simple algorithm – I computed the average strength of the frequencies in the sample, and looked at all the frequencies above two standard deviations above that average. This yields an array, where each frequency is marked by 0, or 1. (Instead of 1 you can mark it with its original strength). For each consecutive sequence of frequencies with 1 – I computed the average of the run, and that was the peak location. So for example, if the data was:
0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
The peak would be in the frequency matching index 5.
(If you used the original strength instead of 0 and 1, you can apply a weighted average to get a more accurate result)

To find the actual note of a given frequency, I used the fact that notes (half-tones) are separated by a ratio of 2^(1/12) . (the twelfth root of 2). So, if our base note is A, at 110 Hz, to compute the note we would use the following code snippet:

```note_names = "A A# B C C# D D# E F F# G G#".split() note_index = math.log(note_freq/base_freq, 2**(1.0/12)) note_name = note_names[note_index%12]```

While writing PyTuner, at first I implemented the FFT on my own, just to know how to do it once. I found it helped me to better understand the nature of the algorithm, and its output. I find the practice of implementing various algorithms on your own, for educational purposes, quite healthy. I always like to know how things work ‘behind the scenes’.

Categories

## Riddle of the Week – LStrings

So I wrote about lstrings. I intend to write about them again in a short while – I already finished the basic script a few days ago, but I’m waiting, until I will be satisfied with it.

In the meantime, here is a curious riddle (That I came up with):

Version 1 (not final)

Assume you are given some iteration of an lstring – S. How do you discover the original lstring used to create S?

Discussion

Well, those who are quick to answer will first try to discover the length of the original string – it could be computed directly from S. At this point you should come to the conclusion that without being given the exact number of iterations – the riddle is not that interesting… Iterating on an lstring is an associative operation. If we denote the substitution of a string A into a string B as multiplication, when both are iterations on an original string T , we will get:

A*B =T^n * T^k = T^(n+k) = T^k* T^n

This will result in the obvious answer to the original riddle – “Why, the lstring S is the original string, with zero iterations!”. This answer is correct – and obviously useless :)

So, let us rephrase the riddle:

Version 2 (final)

Assume you are given some iteration of an lstring – S. How do you discover the minimum possible lstring that could be used to create S?

Discussion – But not a solution

This is the proper riddle – have at it! I will be glad to read your thoughts…

note: The acceptable solution should be either an algorithm, or a constructive mathematical proof. I don’t like reading only statements of existence :)

Categories

## Reachability on arbitrary maps

The other day I had an idea. What if you took a map of some country, and from each coordinate computed the time it took to any other coordinate. The basic assumptions are that you can drive on road by car from parking lot to parking lot, and from parking lot to another point you can get only by foot. You can also add trains, and airplanes, each having less routes, but are actually faster. So for example, traveling from city A’s center to city B’s center is quite fast, but traveling from city A’s center to some point near the road from A to B will take some time.

This is a bit similar to computations done by game AI’s to determine how to get from point A to point B on a game board.

Now let us assume assume we want to draw the resulting graph on some kind of a 3D map, where the distance of each point from other points is determined by the time it takes to reach from that point to the others. Actually, it might not always be possible to plot this graph on a 3D map as a friend pointed to me, as 3 neighboring points already lock our point in space. But let us say that you can plot this 3D map. A’s center and B’s center will be near each other, a bit like a paper folding – which is incidentally the most common depiction of ‘science fiction wormholes’ explanations to laymen. It was pretty nice to try and visualize this kind of map. It is also interesting to think what is the least number of dimensions required to plot a given map.

Although I couldn’t come up with ideas about how this kind of visualization might be useful, it was still fun to to think of.

Categories

## Proofreading and what's wrong with the heapq module

Consider the following problem:

you have a large book you would like to proofread, with many chapters (100+) and a few men (4) at your disposal. How would you distribute the chapters among the men, considering that each proofreader must get whole chapters?

Well, the obvious solution is just divide the number of chapters by the number of men, and give each proofreader an appropriate number of (randomly picked?) chapters. But what if the chapters or of varying length? Well, you could just distribute them randomly, but that just doesn’t feel right now does it?

I was asked by a friend to help him write a script to solve this, and quite quickly I came up with the solution:

Sort the chapters according to length (number of words), in descending order. Keep the proofreaders in a sorted order. While there are still chapters left – get the next longest chapter, and give it to the proofreader with the least total length. Rearrange the proofreaders.

This algorithm is a greedy algorithm and is also regarded as the straightforward way to solve this problem.

Well, this seems simple enough – In the case where there are more then a few proofreaders – well, the obvious solution is to use a minimum heap. Using a minimum heap in python should be easy – just import the heapq module, use heapify and various other functions, just as you would use sort, and you are home free.

Not quite…

Turns out, that using the heapq module isn’t so easy as it seems, or at least not as quick and simple as I would like it to be. While you can sort list using your own compare function, and also providing a key-function, the heapq module sorts items using Python’s standard operators. This means, that to use heapq properly to sort an object according to some property, you have to write an object wrapper that has __lt__ and similar functions defined. All this coding could be saved, if heapq had a key argument that could be used, or any other method to control the comparisons.

And what about the proofreading you ask? Well, it worked. It divided the chapters quite evenly, although we did end up using sort() repeatedly, as the number of proofreaders was small and we did not want to overly complicate matters.

This again emphasizes a something I’ve come to discover – being able to write something quickly is usually more important then later being able to run it quickly. If you write something in an environment that allows for quick coding – later if speed is required you’ll be able to optimize it quickly, or even change the design or the basic algorithm.

Categories

## Interesting links – 3

How to Write a Spelling Corrector

This was a very interesting article, with some interesting python code, and a good mathematical explanation. I enjoyed reading it. I got to it through Peterbe.com, which appeared on Daily-Python.

Robert Lang’s Origami Page

Fascinating page. A few months ago I was getting back to doing some Origami (it is a good relaxation method, also helps your wrists after typing too much :). I was trying to find ways to create origami creatures with more then two limbs, a head, and a tail, and I was fascinated by pictures of origami foldings of intricate creatures with many limbs, such as spiders and dinosaurs. I was also trying to formulate for myself some of the mathematical laws for origami. So I started to look into the subject and found out about crease patterns, and among many other sites, I got to Robert Lang’s site. The subjects he writes about are very diverse – and include using ‘origami knowledge’ to fold airbags, a program to create complicated crease patterns, an origami simulator (something I wanted to write myself :), and so on. Really, a fascinating read.

Categories

## Interesting Links – 2

Anti-Patterns

I first saw this term on The Daily WTF, in “Inner Platform Effect”. I quote: “They describe a frequently repeated problem in designing a commonly-occurring solution”. Recently I came upon them while going over wikipedia, and found that Anit-Patterns could be actually used as a teaching tool. Some patterns that were an interesting read for me are “Database as an IPC” and “Lava Flow”, mostly because I came close to them. It is interesting to note that programming is about naming things. When you successfully name a concept – it probably means you understand it. Going over this list might give you a frightning feeling of deja-vu. So read and take heed.

Catching Lions

Well, this is an old one… Not many people know, but the first mention of the subject was in the article by H. Petard, “A Contribution to the Mathematical Theory of Big Game Hunting”, that appeared in 1938. Since then many more articles were written on the subject. It is always amusing to note that the regular algorithm for searching a sorted array is often-times called “lion in the desert”, only because of this article. I actually wanted to look it up just because I came across this link, and near it were other old jokes, one of them being this old subject. This is usually regarded as the worst kind of humour :)