Pyweb-il Presentation on Optimization Slides

Last Monday I gave a presentation in pywebil on optimization, that’s loosely based on my blog post on the same subject. Here are the slides for that presentation.

Posted in Optimization, Programming, Python | Tagged , , , | Leave a comment

Javascript Element Creator

Some time ago I was working on optimizing the client side code of my website, plnnr.com, an online trip planner.
This website does automatic trip planning, and the problem was that recalculating trips was slow. After profiling, I found out that most of the time wasn’t actually taken up by the algorithm, but by the UI. Rendering the trip to html was the costly part. The process was like so:

Client-side Javascript code generates new trip prefs -> application calculates new trip -> Client-side Javascript gets the new trip, and creates new html.

It’s important to note that the app is “ajax based”, so the actual trip html was generated by the Javascript code, and not the server. At the time I was using Mochikit to generate the new html. Mochikit has a pretty nifty API for generating html, but it’s quite a bit on the slow side. Basically, this API is a wrapper around createElement.

Well, first I did a little test, and found out that generating html with cloneNode and innerHTML is much faster than createElement. Still, there was a problem – I needed to generate many similar elements – similar but not identical. Consider entries on a trip itinerary – they all look the same, yet each one has a different name, a different time string, and a different onclick event.

What I needed was a Javascript based html template library. My requirements:
1. Speed. Html had to be generated quickly.
2. Expressiveness. It had to be able to create pretty arbitrary html with a given context. For example, an anchor element (<a> tag) with a given href property, and a given text content.
3. References to inner elements: Many elements inside the generated html need various events attached to them, or various code processing. This should be easy to achieve.
4. The library has to allow the template html to be written as html, and not only as javascript strings.

So, I sat down with Benny, and we wrote the Javascript Element Creator, which we are now releasing under the BSD license. I originally wrote it to work with Mochikit and the Sizzle library, and Benny changed his version to worked with jquery.

After adding the code to my project, I got two things: first, everything worked much, much faster. Second, it was much easier changing the generated html when it was generated according to a template, and not directly in code.

Instructions

1. Write your html somewhere visible to the javascript code. Add the “template” class to the upper node, and the id will be the name of the template. For example:

<body>
    <div id="some_div" class="template">
    </div>
</body>

2. Similarly to other template engines, add double brackets to signify where text should be inserted:

<body>
    <div id="some_div" class="template">
        <a href="[[link_url]]">[[link_text]]</a>
    </div>
</body>

3. Create a creator object. It will “collect” your template, and will make it available to your code.

var creator = new ElementCreator();

4. Generate your DOM object, and add it to the document;

var obj = creator.generate("some_div",
                           {link_url: '/url/',
                            link_text: 'hello world'});
appendChildNodes(foo, obj);

The code

We decided to publish for now only the jquery version. I might publish the mochikit version as well at a later date. Since Benny wrote the jquery version, he also wrote the tests for that version.

All in all, the final code is pretty short, and could probably be even shorter. Still, it’s good enough, and gave me a very real performance boost.

Here is the code, have fun with it.

Posted in Javascript, Optimization, Programming, Projects, startup, web-design | Tagged , , , | 2 Comments

Call for Volunteers: Open Knesset – oknesset.org

Over the last few weeks, I’ve been lightly involved in work on open knesset.
Mostly I’ve been helping two of the main developers, Benny and Ofri, and joining the discussions on the discussion group.

(For the non-Israelis: the Knesset is Israel’s congress, where laws are passed.)

The website’s mission is to improve Israeli citizens’ involvement in our democracy, and the first step in doing so is giving people more information. Ever wanted to know who keeps his promises? Who voted how? Who never votes? Which Member of Knesset is never present in discussions and votes?

Open Knesset is the place to put this information. If you know a bit of Python & Django, you can join development either on the content harvesting front, or on the front-end front (pun not intended :).

“What about algorithms?” you may ask, or “what does this project has to do with algorithm.co.il?”. Well, there’s also plenty of room for innovation and interesting features. For example, finding the correlation between Members of Knesset that always vote together. Knowing which vote is for which law. Understanding if the vote is for or against that law. Plotting the party graph, and working with it. These are all things that still need to be done. See these graphical examples to see what I’m talking about.

The website is still in its infancy, but it already has lots of content and lots of features. Nevertheless, it still needs more work.
If you’re looking for an open-source project where your work will have great impact, this might just be it.

Posted in Projects | Tagged , , , | 2 Comments

Ethics in Programming

Some time ago I was bothered by the issue of ethics in programming.
I heard the question best raised during a “game unconference” I attended. There was a panel about monetary systems for games, and people talked about the issues faced when adding money to an online game.
At one point someone from the audience said about ingame monetary systems (such as in WoW) “it’s like gambling and drugs!”, to which one panelist jokingly replied “so we have a proven business model”, and another said “except it’s legal”.

This was all in good spirit, but it got me thinking:

What are the programming jobs I will not take?

Continue reading “Ethics in Programming” »

Posted in Programming, Programming Philosophy | Tagged , | 7 Comments

Simple SQLObject DB Migration how-to

I’ve been using sqlobject for plnnr.com for quite some time now. So far my experience with it has been positive. Although I’ll probably change ORM when I move to django, for now it stays. While it stays, I need to be able to upgrade my schema to add features.
SQLObject already has a tool for the job, sqlobject-admin. There are instructions on how to use it, but I found them unsatisfactory.
(By the way, both django’s ORM and sqlalchemy also have tools for that, django-south and sqlalchemy-migrate respectively.)

So here is how I use sqlobject-admin to do migrations. Note that if you’re using turbogears 1.0, you would probably be using tg-admin. In that case, bear in mind that tg-admin just simplifies the job for you by adding various standard parameters, but apart from that, the idea stays the same.
Notes:
* I wrote these instructions on a windows machine. On linux machines it should be almost the same, but might require tweaking.
* I used a specific db URI in the examples. You can change it to whatever you want.
* I once had to tweak the main sqlobject-admin file to add the current dir to sys.path. YMMV.

1. Example project:
Let’s setup a project that uses sqlobject. We’ll create a single file, ‘main.py’ with the following content:

import sqlobject
 
sqlobject.sqlhub.processConnection = sqlobject.connectionForURI('sqlite:/D|/work/sotest/sotest.sqlite')
 
class MyThing(sqlobject.SQLObject):
    bla = sqlobject.StringCol()

This is about as simple as I could get it with sqlobject.

2. Starting to use sqlobject-admin
Sqlobject-admin has quite a bit of bureaucracy to go through before you get everything to work right. For a simple project, I cheat (i.e. fake an egg :), and do the following:
a. Create a directory in your project called sqlobject-history
b. If your project name is sotest, create a directory inside your project called sotest.egg-info
c. Inside that dir create a file called sqlobject.txt
d. Inside that file write:

db_module=main
history_dir=$base/sqlobject-history

(note that the main here is the name of the module we created earlier).

3. Start using sqlobject-admin
This will be the workflow with sqlobject-admin:
1. Have the creation sql for the current code version.
2. Update your code
3. Generate the creation sql for the new code version, *without updating the db*
4. Create an upgrade script using the diff between the versions
5. Use the upgrade script.

More specifically:
1. First time – do:

sqlobject-admin record –egg=sotest -c sqlite:/D|/work/sotest/sotest.sqlite

2. To see that everything works, do:

sqlobject-admin list –egg=sotest -c sqlite:/D|/work/sotest/sotest.sqlite

and:

sqlobject-admin status –egg=sotest -c sqlite:/D|/work/sotest/sotest.sqlite

3. Update your database definition (in the Python file). For example, change the contents of main.py to:

import sqlobject
 
sqlobject.sqlhub.processConnection = sqlobject.connectionForURI('sqlite:/D|/work/sotest/sotest.sqlite')
 
class MyThing(sqlobject.SQLObject):
    bla = sqlobject.StringCol()
    bla2 = sqlobject.StringCol()

4. Here is the critical part. Do

sqlobject-admin record –egg=sotest -c sqlite:/D|/work/sotest/sotest.sqlite –no-db-record

In the sqlobject-history directory there should be now two subdirectories, for each version. Let’s call the old version X and the new version Y. In the old version directory create a file:
upgrade_sqlite_Y.sql (where Y is the new version’s name).
In this file, write down the sql to add the bla2 column to the MyThing table. You can use the creation sql commands in the respective versions’ directories to write it.

(note: if we used –edit we would get an editor opened, and if the edited file has any content when you close it, it will be saved as the upgrade script. I don’t like using this method. Note that if you’re on windows you’ll have to fix sqlobject-admin to open your editor, as the command it uses works only on linux machines.)

5. run

sqlobject-admin upgrade –egg=sotest -c sqlite:/D|/work/sotest/sotest.sqlite

6. Make sure everything is OK with sqlobject-admin status.

3. After using the upgrade script
You can use the same upgrade script for other instances of your project. Just make sure that you have the versions numbers correct, and the first version recorded in the database.

I hope this will be useful for someone using sqlobject, I know I needed this kind of how-to. If you have any questions, feel free to ask them in comments below.

Posted in Programming, Python | Tagged , , , | Leave a comment

The mathematics behind the solution for Challenge No. 5

If you take a look at the various solutions people proposed for the last challenge of generating a specific permutation, you’ll see that they are very similar. Most of them are based on some form of div-mod usage. The reason this is so, is because all of these solutions are using the Factorial Base.

What does that mean?
Note that we usually encounter div-mods when we want to find the representation of a number in a certain base. That should already pique your interest. Now consider that a base’s digits need not have the same weight. For example, consider how we count the number of seconds since the start of the week:

seconds of the last minute, A (at most 60-1)
minutes of the last hour, B (at most 60-1)
hours of the last day, C (at most (24-1)
days of the last week, D (at most 7-1)

So given A, B, C, D, we would say that the number of seconds is:
A + 60*B + 24*C + 7*D. This certainly looks like a base transformation. To go back, we would use divmod.

The factorial base is just the same, with the numbers n, n-1, … 1. Note that in the factorial base, you can only represent a finite number of numbers – n!. This should not be surprising – this is what we set out to do in the first place!
The thing that I found really amazing about this is that all the people to whom I posed this challenge came up with almost the same “way” of solving it.

Other interesting curiosities regarding bases can be found in Knuth’s book, “The Art of Computer Programming”, volume 2, Section 4.1.

Posted in Challenges, computer science, Programming, Python | Tagged , , , , | Leave a comment

Small Programming Challenge no. 5 – Generating a Permutation

I thought of this one quite a long time ago, and I believe that the idea behind it is pretty nice mathematically. I got the idea for it from Knuth’s “The Art of Computer Programming”.

The challenge is simple:
write a function that receives as arguments two numbers, n, and num such that 0 <= num < n!. This function needs to return an array (list) representing a permutation of the numbers 0..n-1. For each possible num, the function needs to return a different permutation, such that over all values of num, all possible permutations are generated. The order of permutations is up to you.

The function you write should do this in at most O(n) time & space (Various O(nlogn) are also acceptable).
Write your solutions in the comments, in [ LANG ] [/ LANG ] blocks (without the spaces) where LANG is preferably Python :). I will post my solution in a few days. As usual, the most efficient & elegant solution wins.

Go!

Posted in Challenges, computer science, Programming | Tagged , | 15 Comments

And now for something completely different

My friend Yuval whom you might already know from the comments here, apparently composed music for the Python Zen. It made me laugh today, and as it’s been a long day, I thought it’s worth sharing here. Especially as it is Python related.

Posted in Humour, Python | Tagged , , , | Leave a comment

Leaky method references

After reading my last post regarding __del__, you should know that __del__ + reference cycle = leak.
Let’s say that you do need to use __del__, so you decide to avoid reference cycles. You write your code in such a way as to use the minimum necessary cycles, and for the ones that remain you use the weakref module.

You might still have cycles where you don’t expect it – in references to methods.
Consider the following piece of code. Can you spot the reference cycle?

class A(object):
    def f(self):
        return
 
a = A()
a.g = a.f

This code has the following reference cycle: a -> a.g -> a.f -> a.
How come?
When you call a.f like so: “a.f()” two things happen:
1. A.f is bounded to a
2. The bounded A.f is called with the first parameter getting the bounded value.

You may consider that “a.f” is syntactic sugar for the partial function application, A.f gets a as a first argument but doesn’t get called yet.

When you use “a.g = a.f” what actually happens is a holding a reference to a bounded method, which holds a reference to a.

An idiom that uses these cycles is implementing state machines. Consider the following example code:

class MyMachine(object):
    def __init__(self):
        self.next_func = self.state_a
    def run(self, input):
        for x in input:
            self.next_func(x)
    def state_a(self, value):
        print 'a: ', value
        self.next_func = self.state_b
    def state_b(self, value):
        print 'b: ', value
        self.next_func = self.state_a

Of course, my code was a bit more complicated than that, but the basic idea remains. (My code usually created some kind of function table in __init__ used to lookup the next function, and lookups happened outside “state functions”). I’ve seen many state machine recipes include method references – and rightly so. It’s a clear and easy way to code a state machine. (For example, this state machine recipe).
Be careful though – once you add __del__ to these simple recipes you might end up with a memory leak.

Short note: I was going to publish this post a few days ago, but kylev beat me to it. This just goes to show that other people encountered this kind of cycle.

Posted in Python | Tagged , , , , | Leave a comment

Python Gotchas No. 2: Garbage Collection Oddities

Python is a garbage collected language. The garbage collector will collect orphaned objects. These are objects that have no references.
If an object has a __del__ method, it will be called when that object is collected. Note however, that there are no guarantees as to when this will take place. This means that while you should release all owned resources in the __del__ method, you should not depend on it to release these resources when the object “goes out of scope” as in C++.
Specifically note that del x does not call x’s __del__ method, just removes this specific reference to x.
To be explicit about resource management, call the appropriate function yourself in the flow of your program, or better yet, use try-finally or the with statement. (For more information about those, see my presentation about advanced python subjects).

There is another interesting caveat regarding Python’s gc. Consider the following code:

class A(object):
    pass
 
a = A()
b = A()
a.x = b
b.x = a
del a
del b

Did a and b lose their references? Well, they didn’t. They’re still pointing at each other, creating a reference cycle. Happily, Python’s garbage collector can still handle those. However, it won’t be able to handle reference cycles if at least one object in the cycle has a __del__ method.
To understand why, consider the above example, only this time assume A has a __del__ method that calls self.x.release().

Now, which __del__ method should be called first? if it is called, is the other one still valid? Python refuses the temptation to guess, and leaves the cycle be, creating a memory (or resource) leak.
The solution?
1. Avoid data structures with reference cycles. For instance, there are very few instances where you’d need a doubly linked list :)
2. If you do need reference cycles, consider using the weakref module, which allows you to create references that “don’t count” in the eyes of the gc.

Further reading material:
1. The __del__ method.
2. The gc (garbage collector) module.

Posted in Programming, Python | Tagged , , , | 1 Comment