Archive for the ‘Projects’ Category

Javascript Element Creator

Wednesday, March 17th, 2010

Some time ago I was working on optimizing the client side code of my website, plnnr.com, an online trip planner.
This website does automatic trip planning, and the problem was that recalculating trips was slow. After profiling, I found out that most of the time wasn't actually taken up by the algorithm, but by the UI. Rendering the trip to html was the costly part. The process was like so:

Client-side Javascript code generates new trip prefs -> application calculates new trip -> Client-side Javascript gets the new trip, and creates new html.

It's important to note that the app is "ajax based", so the actual trip html was generated by the Javascript code, and not the server. At the time I was using Mochikit to generate the new html. Mochikit has a pretty nifty API for generating html, but it's quite a bit on the slow side. Basically, this API is a wrapper around createElement.

Well, first I did a little test, and found out that generating html with cloneNode and innerHTML is much faster than createElement. Still, there was a problem - I needed to generate many similar elements - similar but not identical. Consider entries on a trip itinerary - they all look the same, yet each one has a different name, a different time string, and a different onclick event.

What I needed was a Javascript based html template library. My requirements:
1. Speed. Html had to be generated quickly.
2. Expressiveness. It had to be able to create pretty arbitrary html with a given context. For example, an anchor element (<a> tag) with a given href property, and a given text content.
3. References to inner elements: Many elements inside the generated html need various events attached to them, or various code processing. This should be easy to achieve.
4. The library has to allow the template html to be written as html, and not only as javascript strings.

So, I sat down with Benny, and we wrote the Javascript Element Creator, which we are now releasing under the BSD license. I originally wrote it to work with Mochikit and the Sizzle library, and Benny changed his version to worked with jquery.

After adding the code to my project, I got two things: first, everything worked much, much faster. Second, it was much easier changing the generated html when it was generated according to a template, and not directly in code.

Instructions

1. Write your html somewhere visible to the javascript code. Add the "template" class to the upper node, and the id will be the name of the template. For example:

<body>
    <div id="some_div" class="template">
    </div>
</body>

2. Similarly to other template engines, add double brackets to signify where text should be inserted:

<body>
    <div id="some_div" class="template">
        <a href="[[link_url]]">[[link_text]]</a>
    </div>
</body>

3. Create a creator object. It will "collect" your template, and will make it available to your code.

var creator = new ElementCreator();

4. Generate your DOM object, and add it to the document;

var obj = creator.generate("some_div",
                           {link_url: '/url/',
                            link_text: 'hello world'});
appendChildNodes(foo, obj);

The code

We decided to publish for now only the jquery version. I might publish the mochikit version as well at a later date. Since Benny wrote the jquery version, he also wrote the tests for that version.

All in all, the final code is pretty short, and could probably be even shorter. Still, it's good enough, and gave me a very real performance boost.

Here is the code, have fun with it.

Call for Volunteers: Open Knesset – oknesset.org

Thursday, March 11th, 2010

Over the last few weeks, I've been lightly involved in work on open knesset.
Mostly I've been helping two of the main developers, Benny and Ofri, and joining the discussions on the discussion group.

(For the non-Israelis: the Knesset is Israel's congress, where laws are passed.)

The website's mission is to improve Israeli citizens' involvement in our democracy, and the first step in doing so is giving people more information. Ever wanted to know who keeps his promises? Who voted how? Who never votes? Which Member of Knesset is never present in discussions and votes?

Open Knesset is the place to put this information. If you know a bit of Python & Django, you can join development either on the content harvesting front, or on the front-end front (pun not intended :).

"What about algorithms?" you may ask, or "what does this project has to do with algorithm.co.il?". Well, there's also plenty of room for innovation and interesting features. For example, finding the correlation between Members of Knesset that always vote together. Knowing which vote is for which law. Understanding if the vote is for or against that law. Plotting the party graph, and working with it. These are all things that still need to be done. See these graphical examples to see what I'm talking about.

The website is still in its infancy, but it already has lots of content and lots of features. Nevertheless, it still needs more work.
If you're looking for an open-source project where your work will have great impact, this might just be it.

Fast Peak Autocorrelation

Wednesday, October 14th, 2009

So, I was at geekcon. It was a blast.
The lunar lander
There were many interesting projects, and I didn't get to play with them all. I did get to work a bit on the Lunar Lander from last year, and this year it was finished successfully. My part was the PC game which interfaced with the microcontroller controlling the lander. As you probably guessed, it was written in Python.

This year, as promised, I worked on the Automatic Improviser. I worked on it with Ira Cherkes.

While the final version worked, it didn't work well enough, and there is still much work to be done. Still, we had excellent progress.
By the way, I know this subject has been tackled before, and I still wanted to try it myself, without reading too much literature about it.

One of the components of the system is a beat recognizer. My idea to discover the beat is simple: find the envelope (similar to removing AM modulation), and then find the low "frequency" of the envelope.
Instead of doing a Fast Fourier Transform for beat recognition, we were advised that autocorellation will do the trick better and faster. However, when trying to autocorellate using scipy.signal.correlate we discovered that autocorellation was too slow for real time beat recognition, and certainly wasteful.

To solve this issue, we decided to first do peak detection on the envelope, and then autocorellate the peaks. Since there shouldn't be too many peaks, this has the potential of being really quick. However, there was no standard function to do autocorellation of peaks, so we implemented it ourselves. We were pretty rushed, so we worked fast. Here's the code:

def autocorrelate_peaks(peaks):
    peaks_dict = dict(peaks)
    indexes = set(peaks_dict.keys())
    deltas = set()
    for i in indexes:
        for j in indexes:
            if j>i:
                continue
            deltas.add(i-j)

    result = {}
    for d in deltas:
        moved = set(i+d for i in indexes)
        to_mult = moved & indexes
        assert to_mult <= indexes
        s = sum(peaks_dict[i-d]*peaks_dict[i] for i in to_mult)
        result[d] = s
    return result

This function takes as input a list of tuples, each of the form (peak_index, peak_value), and returns a mapping between non-zero corellation offsets and their values.
Here's is a sample run:

In [7]: autocorrelate_peaks([(2, 2), (5,1), (7,3)])
Out[7]: {0: 14, 2: 3, 3: 2, 5: 6}

After implementing this function, our recognition loop was back to real-time, and we didn't have to bother with optimizing again.

Preparing PyImprov for GeekCon on Friday

Wednesday, October 7th, 2009

A long long time ago, I wrote Pytuner. It was one of the first projects I published on this website.
For a long time it just sat there, doing nothing, while the library it's based on - PyMedia, wasn't being maintained anymore, and PyTuner could only work on Python 2.4.

Enter GeekCon - GeekCon is a get together of people looking to work on their wildest fantasy projects - things that they don't get do because of their regular work. Last yet I worked on a real life 3d lunar lander, and this year I thought I'd take the opportunity to work on PyImprov - my wild fantasy project.
The idea is simple - I'll play some simple chords, and the script will improvise some blues solo. To that purpose, I wrote (also a long time ago) a chord recognizer. Now I'm missing a beat recognizer, and a simple improviser, which I plan to complete during the GeekCon weekend.

In preparation for the event, I've opened up a subversion repository for PyImprov on Assembla, and patched the scripts to work with PyAudio instead of PyMedia.

So, onwards to GeekCon, see you on the other side, with a guitar and a Python script in hand!

PyKoan – The Logic Game

Sunday, June 1st, 2008

As you can probably tell, I'm back from my undeclared hiatus. I've got lots of stuff to talk about, and I'll be starting with PyKoan, one small project I've been working on lately in my spare time.

A few weeks ago I stumbled upon an article in wikipedia, regarding a logic game. This game fascinated me, especially because of the "Godel Escher Bach" connection. Quoting from Wikipedia:

Zendo is a game of inductive logic designed by Kory Heath in which one player (the "Master") creates a rule for structures ("koans") to follow, and the other players (the "Students") try to discover it by building and studying various koans which follow or break the rule. The first student to correctly state the rule wins.

As it happens I'm also taking a mathematical logic course this semester. Having read about the game, I wanted to write a similar computer game. Hence - PyKoan.

PyKoan is a game where the goal is to discover some logical rule, for example, "For each x holds x%2 == 0". This rule is applied to a koan - a list of integers. An example koan that "has Buddha nature" (follows the rule) is [0,2,8]. One which doesn't is [1].

To implement this idea, I wrote an implementation of an expression tree very similar to the one used in vial, but with a different implemented language. I also did some experimentation with the design. Since I've been talking a lot about expression trees without giving a full explanation, in a future post I'll write about the implementation used in PyKoan.

So far I didn't code a lot of the game, just the expression tree framework, and a simple rule builder. When using Python's interactive prompt, one can get a taste of how the game might feel:

In [19]: r = rulegen.create_rule(rulegen.easy_grammer, rulegen.easy_grammer_start)

In [20]: r.eval([])
Out[20]: False

In [21]: r.eval([0])
Out[21]: False

In [22]: r.eval([1])
Out[22]: False

In [23]: r.eval([2])
Out[23]: True

In [24]: r.eval([2,2])
Out[24]: True

In [25]: r.eval([2,2,3])
Out[25]: True

In [26]: r.eval([3])
Out[26]: False

In [27]: r.eval([4])
Out[27]: False

In [28]: print r
x exists such that x == 2

Here is how I would generate such an expression manually:

In [3]: from exptree import exps

In [4]: exps.Exists('x',exps.Eq('x',2))
Out[4]: exps.Exists(exps.Var(x), exps.Eq(exps.Var(x), exps.Imm(2)))

In [5]: print _
x exists such that x == 2

The game has many interesting possibilities for research, for example, computer players. Other possibilities include "just" guessing koans (not the rule itself), creating interesting and playable rules, and so on. There's a lot to do.

This time, instead of just publishing the (unfinished) code, I decided to do something different. I've opened a space in assembla, with public read access. I'm opening this project for participation: if you want to join then leave a comment, or send me an email.

(Since Assembla seems to be going through some connectivity issues right now, here's a link to a copy).

Issues in Writing a VM – Part 2

Friday, March 28th, 2008

The Incredible Machine

Writing a VM capable of executing expression trees is different from writing a VM for executing assembly instructions. Here I'll cover several issues stemming from this difference.

The first group of issues involve generality. Supporting a specific instruction set is a straightforward job, even if a hard one. Vial has to support multiple platforms and that's a little tricky. These are just a few of the problems:

  1. Different registers for each instruction set. This one is easy enough to solve. Just have a node in the expression tree with some value to indicate the register.
  2. Register overlaps. A change to one register implies a change to its parents and its children. Consider RAX->EAX->AX->AH,AL. Changing EAX will affect all of the registers in the hierarchy.
    To handle this, we wrote a CPU class to keep all the info about the platform, including register overlaps.
  3. Delayed branches. Some platforms have branch delay slots. This means that after any branch instruction, right before the branch is taken, instructions in the delayed slots are executed anyway. For instance, SPARC has three delay slots, while MIPS has just one. This isn't an easy issue to solve, and for now we didn't tackle it. We've got a few ideas though.

To make sure that our implementation is generic enough, we decided to write a skeleton disassembler implementation for MIPS as well.

The second group of issues involve the nature of expression trees versus instructions:

  1. Stepping over statements or instructions? Each expression tree for an instruction usually holds more than one statement. For example, dec eax changes eax as well as the zero flag. Since some instructions like rep stosd may contain a long loop, being able to step over statements instead of expressions is preferable.

    The problem is, executing expression trees is done with a DFS-like walk. If implemented with recursion it makes pausing the walk for each step a bit complicated. However, recursion is the clearest way to write the execution code, and I'd rather not give it up.

    My solution was to use Python generators. Each time a statement was executed, the function would yield, thus returning control to the calling function, while keeping its state intact.

  2. Instruction Pointer changes. The lower-level disassembler returns expression trees for each instruction. However, it does not return the jump to the next instruction as well. While this is the required behavior, it means that the VM should change the instruction pointer after executing each instruction.

    This is easier said than done: What should we do after jumps? Several possible solutions emerged.

    The first was to append an IP change to each instruction's expression. Those changes will have to be appended only where needed.
    A second solution was to check if the IP was changed, and if it was not, change it. This solution however will not support instructions that jump to themselves.
    The last and our preferred solution was to check if the IP was touched. This solution is simple and very straightforward.

There are many more issues I didn't write about, for example, self modifying code. I'll leave those issues for future articles.

Issues in writing a VM – Part 1

Tuesday, March 18th, 2008

Arkon and I decided to write a VM for vial. First though, a short explanation on what is vial:
vial is a project aimed at writing a general disassembler that outputs expression trees instead of text. On top of vial, we intend to write various code-analysis tools. The expression trees in the output should be an accurate description of the all of the code's actions.
(note: the x86 disassembler behind vial is Arkon's diStorm.)

So why do we need a VM? Apart from it being 'nice and all', it is critical for testing.

Some time ago, I described writing a VM to test a compiler I wrote as university homework. It is a similar issue here.
The disassembler is written according to the x86 specification. If we just check its output against this specification, we are not doing much to verify the code's correctness. This is evident when you try to implement such a testing module - you end up writing another disassembler, and testing it against the original one. There has to be a different test method, one that does not directly rely on the specification.

Enter the VM. If you write a program, you can disassemble it, and then try to execute the disassembly. If it yields the same output as the original program - your test passed.
This is a good testing method, because it can be easily automated, reach good code coverage, and it tests against known values.
Consider the following illustration:

Testing Process

We are testing here a complete process on the left hand, against a known valid value, the original program's output, on the right hand. All of the boxes on the left hand are tested along the way. Of course, one test may miss. For example, both the VM and the disassembler may generate wrong output for register overflows. We can try to cover as many such cases as possible by writing good tests for this testing framework. In this case, good tests are either c programs, or binary programs. This is essentially what I was doing when I manually fuzzed my own compiler.

Once the VM is finished, we can start writing various optimizations for the disassembler's generated output. We can test these optimizations by checking the VM's output on the optimized code against the output on the original code. This makes the VM a critical milestone on the road ahead.