Posts Tagged ‘Python’

One-liner Guitar Tuner in Python

Thursday, April 22nd, 2010

Guitar

On windows, assuming imports are free:

import winsound
winsound.Beep(220*((2**(1/12.0))**7), 2000)

But that's just because I like to tune to E. If you prefer a more "natural looking" note, you can use A:

winsound.Beep(110, 1000)

But why choose at all when you can go for all of them?

[winsound.Beep(220*((2**(1/12.0))**i), 500) for i in [7, 2, -2, -7, -12, -17]]

Image by Keela84

Visualizing Data Using the Hilbert Curve

Saturday, April 17th, 2010

Some time ago, a coworker asked me to help him visualize some data. He had a very long series (many millions) of data points, and he thought that plotting a pixel for each one would visualize it well, so he asked for my help.
I installed Python & PIL on his machine, and not too long after, he had the image plotted. The script looked something like:

data_points = get_data_points()
n =  int((len(data_points)**0.5) + 0.5)

image = Image('1', (n, n))
for idx, pt in enumerate(data_points):
    image.putpixel(pt, (idx/n, idx%n))
image.save('bla.png', 'png')

Easy enough to do. Well, easy enough if you have enough memory to handle very large data sets. Luckily enough, we had just enough memory for this script & data series, and we were happy. The image was generated, and everything worked fine.

Still, we wanted to improve on that. One problem with this visualization is that two horizontally adjacent pixels don’t have anything to do with each other. Remembering xkcd’s "Map of the Internet", I decided to use the Hilbert Curve. I started with wikipedia's version of the code for the Python turtle and changed it to generate a string of instructions of where to put pixels. On the way I improved the time complexity by changing it to have only two recursion calls instead of four. (It can probably be taken down to one by the way, I leave that as a challenge to the reader :)

Unfortunately, at this point we didn’t have enough memory to hold all of those instructions, so I changed it into a generator. Now it was too slow. I cached the lower levels of the recursion, and now it worked in reasonable time (about 3-5 minutes), with reasonable memory requirements (no OutOfMemory exceptions). Of course, I'm skipping a bit of exceptions and debugging along the way. Still, it was relatively straightforward.

Writing the generator wasn't enough - there were still pixels to draw! It took a few more minutes to write a simple "turtle", that walks the generated hilbert curve.
Now, we were ready:

hilbert = Hilbert(int(math.log(len(data_points), 4) + 0.5))
for pt in data_points:
    x,y = hilbert.next()
    image.putpixel(pt, (x,y))

A few minutes later, the image was generated.

Fuzz-Testing With Nose

Tuesday, March 30th, 2010

A few days ago, I found a in my website, plnnr.com. The bug was in a new feature I added to the algorithm. The first thing I did was write a small unit-test to reproduce the bug. With that unit-test in hand, I then worked to fix the bug, and got this unit-test to pass.

As I previously persumed this feature to be (relatively :) bug free, I decided that more testing was in order. This time however, a single test-case would not be enough - I needed to make sure that the trip-generation algorithm works in many cases. Enter fuzzing.

Plnnr.com generates trips according to trip preferences. Why not generate the trip preferences with a fuzzer, and then check if the planning algorithm chokes on them? While fuzzing is usually used to generate invalid input with the goal of causing the program to crash, in this case I'm generating valid input with the goal of causing the planning algorithm to fail.

Usually fuzzing is done with one of two techniques - exhaustive fuzzing, that goes systematically (possibly selectively) over the input space and random fuzzing, which picks inputs at random - or "somewhat" randomly. In my case, the input space consists of "world data" - locations of attractions, restaurants, etc, and trip preferences - intensity, required attractions, and so on. Since the input space is so large and "unstructured", I found it much easier to go with random fuzzing.

In each test-case, I will generate a "random world", and random trip preferences for that world.
Here is some sample code that shows how this might look:

trip_prefs.num_days = random.randint(0, 5)
trip_prefs.intensity = random(0, 5)
if randbit():
    trip_prefs.schedule_lunch = True

Where randbit is defined like so:

def randbit(prob = 0.5):
    return random.random() < prob

This is all very well, but tests need to be reproducible. If a fuzzer-generated test case fails and I can't recreate it to analyze the error and later verify that it is fixed, it isn't of much use. To solve this issue, the input generation function receives some value, and sets the random seed with this parameter. Now, generating test cases is just a matter of generating a sequence of random values. Here is my code to do that:

class FuzzTestBase(object):
    __test__ = False
    def run_single_fuzz(self, random_seed):
        pass
    def fuzz_test(self):
        random.seed()
        random_seeds = [str(random.random()) for i in range(NUM_FUZZ_TESTS)]
        for seed in random_seeds:
            yield self.run_single_fuzz, seed

FuzzTestBase is a base-class for actual test classes. Each test class just needs to define its own version of run_single_fuzz, and in it call random.seed(random_seed) and log random_seed.

This code uses nose's ability to test generators: it assumes that a test generator yields test functions and their parameters.

A few interesting issues:
* I generate the random seeds beforehand, so that calling random.seed() in the actual test case doesn't affect the seed sequence.
* Originally I used just random.random() as a seed instead of str(random.random()). The problem with that is that this way it's not reproducible. random.random() returns a floating point value x, for which usually x != eval(str(x)):

In [10]: x = random.random()
In [11]: x == eval(str(x))
Out[11]: False

Even though x == eval(repr(x)) for that case, there's still room for error. Unlike floating point numbers, it's harder to go wrong with string equality. So str(random.random()) is just a cheap way to generate random strings.

I'd recommend that if your testing mostly consists of selected test cases based on what you think is possible user behavior, you might want to add some fuzzed inputs. I originally started the fuzz-testing described in this blog-post to better test for a specific bug. After adding the fuzz-testing, I found another bug I didn't know was there. This just goes to show how useful fuzzing is as a testing tool. The fact that it's so easy to implement is just a bonus.

Pyweb-il Presentation on Optimization Slides

Friday, March 26th, 2010

Last Monday I gave a presentation in pywebil on optimization, that's loosely based on my blog post on the same subject. Here are the slides for that presentation.

Call for Volunteers: Open Knesset – oknesset.org

Thursday, March 11th, 2010

Over the last few weeks, I've been lightly involved in work on open knesset.
Mostly I've been helping two of the main developers, Benny and Ofri, and joining the discussions on the discussion group.

(For the non-Israelis: the Knesset is Israel's congress, where laws are passed.)

The website's mission is to improve Israeli citizens' involvement in our democracy, and the first step in doing so is giving people more information. Ever wanted to know who keeps his promises? Who voted how? Who never votes? Which Member of Knesset is never present in discussions and votes?

Open Knesset is the place to put this information. If you know a bit of Python & Django, you can join development either on the content harvesting front, or on the front-end front (pun not intended :).

"What about algorithms?" you may ask, or "what does this project has to do with algorithm.co.il?". Well, there's also plenty of room for innovation and interesting features. For example, finding the correlation between Members of Knesset that always vote together. Knowing which vote is for which law. Understanding if the vote is for or against that law. Plotting the party graph, and working with it. These are all things that still need to be done. See these graphical examples to see what I'm talking about.

The website is still in its infancy, but it already has lots of content and lots of features. Nevertheless, it still needs more work.
If you're looking for an open-source project where your work will have great impact, this might just be it.

And now for something completely different

Tuesday, November 10th, 2009

My friend Yuval whom you might already know from the comments here, apparently composed music for the Python Zen. It made me laugh today, and as it's been a long day, I thought it's worth sharing here. Especially as it is Python related.

Leaky method references

Wednesday, November 4th, 2009

After reading my last post regarding __del__, you should know that __del__ + reference cycle = leak.
Let's say that you do need to use __del__, so you decide to avoid reference cycles. You write your code in such a way as to use the minimum necessary cycles, and for the ones that remain you use the weakref module.

You might still have cycles where you don't expect it - in references to methods.
Consider the following piece of code. Can you spot the reference cycle?

class A(object):
    def f(self):
        return

a = A()
a.g = a.f

This code has the following reference cycle: a -> a.g -> a.f -> a.
How come?
When you call a.f like so: "a.f()" two things happen:
1. A.f is bounded to a
2. The bounded A.f is called with the first parameter getting the bounded value.

You may consider that "a.f" is syntactic sugar for the partial function application, A.f gets a as a first argument but doesn't get called yet.

When you use "a.g = a.f" what actually happens is a holding a reference to a bounded method, which holds a reference to a.

An idiom that uses these cycles is implementing state machines. Consider the following example code:

class MyMachine(object):
    def __init__(self):
        self.next_func = self.state_a
    def run(self, input):
        for x in input:
            self.next_func(x)
    def state_a(self, value):
        print 'a: ', value
        self.next_func = self.state_b
    def state_b(self, value):
        print 'b: ', value
        self.next_func = self.state_a

Of course, my code was a bit more complicated than that, but the basic idea remains. (My code usually created some kind of function table in __init__ used to lookup the next function, and lookups happened outside "state functions"). I've seen many state machine recipes include method references - and rightly so. It's a clear and easy way to code a state machine. (For example, this state machine recipe).
Be careful though - once you add __del__ to these simple recipes you might end up with a memory leak.

Short note: I was going to publish this post a few days ago, but kylev beat me to it. This just goes to show that other people encountered this kind of cycle.