Archive for the ‘Algorithms’ Category

Fast Peak Autocorrelation

Wednesday, October 14th, 2009

So, I was at geekcon. It was a blast.
The lunar lander
There were many interesting projects, and I didn't get to play with them all. I did get to work a bit on the Lunar Lander from last year, and this year it was finished successfully. My part was the PC game which interfaced with the microcontroller controlling the lander. As you probably guessed, it was written in Python.

This year, as promised, I worked on the Automatic Improviser. I worked on it with Ira Cherkes.

While the final version worked, it didn't work well enough, and there is still much work to be done. Still, we had excellent progress.
By the way, I know this subject has been tackled before, and I still wanted to try it myself, without reading too much literature about it.

One of the components of the system is a beat recognizer. My idea to discover the beat is simple: find the envelope (similar to removing AM modulation), and then find the low "frequency" of the envelope.
Instead of doing a Fast Fourier Transform for beat recognition, we were advised that autocorellation will do the trick better and faster. However, when trying to autocorellate using scipy.signal.correlate we discovered that autocorellation was too slow for real time beat recognition, and certainly wasteful.

To solve this issue, we decided to first do peak detection on the envelope, and then autocorellate the peaks. Since there shouldn't be too many peaks, this has the potential of being really quick. However, there was no standard function to do autocorellation of peaks, so we implemented it ourselves. We were pretty rushed, so we worked fast. Here's the code:

def autocorrelate_peaks(peaks):
    peaks_dict = dict(peaks)
    indexes = set(peaks_dict.keys())
    deltas = set()
    for i in indexes:
        for j in indexes:
            if j>i:
                continue
            deltas.add(i-j)

    result = {}
    for d in deltas:
        moved = set(i+d for i in indexes)
        to_mult = moved & indexes
        assert to_mult <= indexes
        s = sum(peaks_dict[i-d]*peaks_dict[i] for i in to_mult)
        result[d] = s
    return result

This function takes as input a list of tuples, each of the form (peak_index, peak_value), and returns a mapping between non-zero corellation offsets and their values.
Here's is a sample run:

In [7]: autocorrelate_peaks([(2, 2), (5,1), (7,3)])
Out[7]: {0: 14, 2: 3, 3: 2, 5: 6}

After implementing this function, our recognition loop was back to real-time, and we didn't have to bother with optimizing again.

Computing Large Determinants in Python

Wednesday, March 25th, 2009

Story:
For my seminar work, I had to calculate the determinant of a large integer matrix.
In this case, large meant n>=500. You might say that this isn't very large and I would agree. However, it is large enough to overflow a float and reach +inf. Reaching +inf is bad: once you do, you lose all the accuracy of the computation. Also, since determinant calculations use addition as well as subtraction, if +inf is reached in the middle of the calculation the value might never drop back, even if the determinant is a "small" number.

(Note: Even if the calculation hadn't reached +inf, but just some very large floating point number, the same phenomenon could have occurred.)

As I said, my matrix was composed of integers. Since a determinant is a sum of multiples of matrix elements, you would expect such a determinant to be an integer as well. It would be nice to take advantage of that fact.

How should one compute such a determinant? Well, luckily Python has support for arbitrarily large integers. Let's compute the determinant with them!
Unfortunately, Python's numpy computes determinants using LU decomposition, which uses division - hence, floating point values.
Well, never mind, I know how to calculate a determinant, right?
Ten minutes later the naive determinant calculation by minors was implemented, with one "minor" setback - it takes O(n!) time.
Googling for integer determinants yielded some articles about division free algorithms, which weren't really easy to implement. There was one suggestion about using Rational numbers and Gauss Elimination, with pivots chosen to tame the growth of the nominator and denominator.

So, I hitched up a rational number class, using Python's arbitrarily large integers as nominator and denominator. Then, I got some code to do Gauss Elimination, although my pivoting wasn't the most fancy. This seemed to work, and I got my desired large determinants.

Epilogue:
Not too long ago, I ran across mpmath, a Python library for arbitrarily large floating point numbers. It is possible I could have used it instead of my own rational numbers. Next time I run into a relevant problem I'll keep this library in mind.
Also, an even shorter while ago, I became aware that since 2.6 Python boasts a fractions module. This will sure come in handy in the future.

Code:
Available here. If you intend to use it in Python 2.6 and above, I suggest replacing my Rational class with Python's Fraction.
(Note that I adapted this module to my needs. Therefore it has a timer that prints time estimations for computations, and it uses Psyco.)

Fractal Memory Usage and a Big Number

Thursday, June 12th, 2008

In a previous post I said I'd talk about 4**(4**(4**4)).
First, about the number. I first saw it mentioned in a math lesson back in high-school, when the teacher wanted to demonstrate estimation abilities.
He wrote it on the board, like so:

4444

and then asked students to estimate how big that number is. Turns out this number is much bigger than most students thought. (For comparison, the number of atoms in the observable universe is 1080.)

Given that this number is so big, it should come as no surprise that computing it will consume all available memory.
What is interesting is the way the memory is consumed:

(It died with a MemoryError just before the next "jump".)

When I first saw this I was fascinated. To those interested it is generated by the long pow function in Python, implemented in longobject.c. According to the comment there, the algorithm is a left to right 5-ary exponentiation, taken from "Handbook of Applied Cryptography".
In short, the simpler algorithm "left-to-right binary exponentiation" (used for smaller exponents) repeatedly squares the accumulator, and according to the binary digits of the exponents, multiplies it by the base. 5-ary exponentiation basically does the same thing, but in each iteration it 'eats' five digits of the exponent instead of just one.

It might be very interesting to study algorithms and code according to their memory usage. Once at work I used a similar memory usage graph to optimize the memory usage of some algorithm that was eating too much memory. I don't remember the details as it happened a few years ago, but I do remember that I recognized the "type of the memory eating", which helped me to narrow the problem down to a specific area of code, where indeed the problem was located. (I consider this a case of programming-fu :)

Notes:
1) The reason I'm talking about Python is because it has arbitrarily sized longs built-in.
2) This graph was generated using Windows' task manager. On my Ubuntu machine it looks quite different, almost a straight line. Any guesses why?

First Code Transformation: Removing Flag Computations

Tuesday, April 8th, 2008

Short introduction to code transformations

If you intend to write a decompiler, you'll find yourself writing code transformations.
In our context, code transformations are operations that take as input an expression tree, and return an equivalent but different expression tree.

An equivalent expression tree is one that does the same thing as the original. More formally, it's an expression tree that when run, will reach the same final state as the original expression tree.

Here's a simple example, converting a while loop to a do-while loop:

Before:

while cond:
    stmt

After:

if cond:
    do:
        stmt
    while cond

Code transformations are a very tricky business. You may find yourself changing the meaning of your expression trees without noticing. That's one of the reasons we wrote the VM: we wanted to make sure that our code transformations will be correct.

Generally, you'll want to transform your code to a somewhat 'better' version. 'Better' in what sense?
Well, the code may be:

  • More readable. A good purpose, as we are writing a decompiler for human beings. Unfortunately, this goal is not well defined. 'More readable' will have to be defined by heuristics.
  • Shorter/smaller. An easier goal to define, smaller code has the following advantages:
    • It might be faster to execute.
    • It might faster to process by other algorithms - after all, it is shorter.
    • It might be more readable.
  • Simpler, in the sense that it uses less types of operations. For example, take transforming a while to a do-while (shown above). If we wanted to analyze loops, it will allow us to write code to handle just do-while loops, instead of handling both cases.

There are many more possible goals for code transformations. The bottom line is that code transformations may do any of the following:
* Make the life of the human reader easier.
* Make the life of the decompiler programmer easier.
* Make the life of the decompiler easier.

An important aspect of code transformations is when to do them. Here is a good rule of thumb:

If it makes it harder for the programmer or the decompiler - it should be done as late as possible.
Otherwise - it should be done as early as possible.

Removing Flag Computations

This is the first code transformation we intend to write. Consider the following case:

sub eax, ebx
sub ebx, ecx

The equivalent expression tree for these two instructions looks something like this:

eax = eax - ebx
zf = eax == 0
ebx = ebx - ecx
zf = ebx == 0

Here I listed only the changes to zf and the actual registers. All the other flags (OF,CF,...) are changed as well. In a sequence of such instructions, all the modifications to the flags in the middle are actually dead code, and may be removed. This transformation will make the code shorter, faster to execute on our VM and faster to analyze by later stages.

This is a special case of dead code removal. This raises a valid question: why not let some general dead code removal transformation handle the flags?

There are a few reasons:

  • A general dead code removal might be more complex to write. Just writing exactly the same algorithm for general purpose registers will be more complicated, because of overlapping registers.
  • This transformation may remove up to 80% of many of the disassembled instructions. Flag calculations take up a lot nodes in the expression trees!
  • This transformation may be run as soon as the disassembly stage itself, while a general dead code removal transformation will not. This is so because of some assumptions we are allowed to make on the way flags behave. For example, an instruction template ("%0=%0-%1; ZF=%0==0" for sub) might have its parameters changed, but the flags affected stay the same.

The implementation of this transformation is not yet complete. When it is, I'll talk a bit about the mechanics behind it. It is much more complicated than it seems!

Issues in Writing a VM – Part 2

Friday, March 28th, 2008

The Incredible Machine

Writing a VM capable of executing expression trees is different from writing a VM for executing assembly instructions. Here I'll cover several issues stemming from this difference.

The first group of issues involve generality. Supporting a specific instruction set is a straightforward job, even if a hard one. Vial has to support multiple platforms and that's a little tricky. These are just a few of the problems:

  1. Different registers for each instruction set. This one is easy enough to solve. Just have a node in the expression tree with some value to indicate the register.
  2. Register overlaps. A change to one register implies a change to its parents and its children. Consider RAX->EAX->AX->AH,AL. Changing EAX will affect all of the registers in the hierarchy.
    To handle this, we wrote a CPU class to keep all the info about the platform, including register overlaps.
  3. Delayed branches. Some platforms have branch delay slots. This means that after any branch instruction, right before the branch is taken, instructions in the delayed slots are executed anyway. For instance, SPARC has three delay slots, while MIPS has just one. This isn't an easy issue to solve, and for now we didn't tackle it. We've got a few ideas though.

To make sure that our implementation is generic enough, we decided to write a skeleton disassembler implementation for MIPS as well.

The second group of issues involve the nature of expression trees versus instructions:

  1. Stepping over statements or instructions? Each expression tree for an instruction usually holds more than one statement. For example, dec eax changes eax as well as the zero flag. Since some instructions like rep stosd may contain a long loop, being able to step over statements instead of expressions is preferable.

    The problem is, executing expression trees is done with a DFS-like walk. If implemented with recursion it makes pausing the walk for each step a bit complicated. However, recursion is the clearest way to write the execution code, and I'd rather not give it up.

    My solution was to use Python generators. Each time a statement was executed, the function would yield, thus returning control to the calling function, while keeping its state intact.

  2. Instruction Pointer changes. The lower-level disassembler returns expression trees for each instruction. However, it does not return the jump to the next instruction as well. While this is the required behavior, it means that the VM should change the instruction pointer after executing each instruction.

    This is easier said than done: What should we do after jumps? Several possible solutions emerged.

    The first was to append an IP change to each instruction's expression. Those changes will have to be appended only where needed.
    A second solution was to check if the IP was changed, and if it was not, change it. This solution however will not support instructions that jump to themselves.
    The last and our preferred solution was to check if the IP was touched. This solution is simple and very straightforward.

There are many more issues I didn't write about, for example, self modifying code. I'll leave those issues for future articles.

Two Mathematical Bugs

Saturday, March 8th, 2008

A few days ago, I discovered I had at least one, maybe two mathematical bugs. The first bug is in the line clipping algorithm. Here's a quick reminder of what's going on there:

There are six points of interest: the four intersection points, and the two endpoints of the line. We sort the six points according to their parameter, and take the two innermost points as the endpoints of the clipped line. To check if the line is outside the box we make sure that the result points' parameters are between 0 and 1 (which correspond to the original endpoints).

Turns out this is not always true - here's a counter example:

Counter Example for the Line Clipping Algorithm

In the counter example you can see a line, whose endpoints lie between the intersections. The algorithm described above will say this line lies completely inside the box. Clearly this is not true. How do we fix this? Simple actually. We check for each intersection point whether or not it is on the edges of the clipping rectangle. If at most one point is on - our line is outside. I hope this is the end of it. It will require more rigorous testing.

The second bug was in my solution to the random selection challenge. I wasn't too sure of my solution, but I really liked it, as its code is rather elegant. So when meeting someone I used to work with, I told her about it. After thinking about my solution a bit, she said it is probably wrong, and gave me the reason: While the probability of a point x passing the first stage is indeed (proportional to) p(x), the probability of it passing the second stage is not 1 (or constant). I still need to think about this one, so no easy fix yet. I guess this reopens the challenge, so if anyone feels like writing a good proof or a counter example for my solution, I'd be happy to see it. I'll be even happier to see a completely different solution.

Rhyme and Reason with Python

Wednesday, February 6th, 2008

After reading xkcd's blog, I went to LimerickDB.com. It looks and works just like bash.org, but for limericks. Quite amused with some of the limericks available, I wanted to try and compose my own.
The first one came out fine, but I got stuck without a rhyme for the second one:

a man named guido once thought
he'd invent a language well wrought
indentation as tabs

Uhm, no luck there... "tabs" does not rhyme well. Then an idea came... I could create a rhyme index!
How to do it?
First, I took my already written word index. Last time I used it was to solve a substitution cipher for my crypto homework.
Then, I looked for the phonemes for each word, using espeak - a text to speech program:

lorg@kild:~$ espeak -q -x hello
 h@l'oU

After doing so, I collected words that had similar phonemes endings, and there it was, a rhyme index:

In [84]: words.get_word_rhymes("tabs", word_index, rhyme_index)[:10]
Out[84]:
['crabs', 'stabs', 'macnabs',
 'dabs', 'cabs', 'scabs',
 'slabs', 'arabs', 'snobs',
 'thebes']

Note that better rhymes are listed first - according to the length of the match.
To create this list I didn't use stress information, although with some more work it may be done.

Back to our limerick, we can see that "tabs" does not have really good rhymes. Better luck with indentation maybe:

In [86]: words.get_word_rhymes("indentation", word_index, rhyme_index)[:20]
Out[86]:
['kishion', 'alleviation', 'expostulation',
 'machination', 'syrophenician', 'proposition',
 'infatuation', 'computation', 'mansion',
 'meditation', 'aggression', 'anticipation',
 'clarification', 'logician', 'gratulation',
 'discretion', 'quotation', 'presentation',
 'dissolution', 'deprecation']

And we get (not the best limerick I admit):

a man named guido once thought
he'd invent a language well wrought
with space indentation
and good presentation
we all got the language we sought

My original ending line was "it left all the perl hackers distraught"... but it ain't true, so I gave it up. The 'distraught' was found with my script as well.
If you want to look for your own rhymes, then you can use my hacked-up script, although it isn't written well enough for my taste. (For example, don't expect to see documentation :)
I'll upload a better version once I refactor it.
To save you the trouble of generating the word index and the rhyme index from scratch, I used cPickle to store them. You can use them for lookups regardless of the script.