Categories

## Fibonacci meals

Lately I’ve been reminded of the existence of Fibonacci meals. For a long time, I despised those, but now that I’m preparing my own food most of the time, I’m aiming for them, to save time cooking.

A Fibonacci meal is an element of a sequence defined to be:

```a(0) = some food
a(1) = some food
a(n) = a(n-1) + (a(n-2) as sauce)```

Very simple, and very effective. You can find those in various places, such as camps, school cafeteria, etc…

Generally this sequence does not converge, however there is a special case for cakes and desserts:

```c(0) = some cake or dessert
c(1) = some cake or dessert
c(n) = c(n-1) + (c(n-2) as icing)```

This sequence does converge, to the chocolate ball. (in Wikipedia, Chokladboll).

Categories

## I fold

Just some general stuff I folded while working. A pig, a sphinx, and a six legged thing.

The question still remains, what’s easier, the sphinx or the pig. Both are really simple folds. I reckon the sphinx is easier. However, the six legged thing, while complicated, doesn’t resemble anything I know, and I couldn’t figure out how to make it look like something. This just goes to show, yet again, that less is more.

Categories

## Elegant Line Clipping Algorithm

I actually wrote about that a long time ago, in the wiki that once was algorithm.co.il. Since the wiki is long gone, and I figured this is a very elegant algorithm, I decided I should write about it.

First, some background – while writing Ascii-Plotter, I had to plot lines. To plot lines correctly I had to clip them. Now since this was a fun project, I decided I wanted to figure out how to do that myself, even though the problem is really solved. So I sat down with pen and paper, and did a little thinking, and I came up this nice algorithm.

The idea is simply this: consider a line, that is not parallel to any of the axes. This line intersects all the four lines that make up the limits of our screen. Here is a diagram:

Now notice that there are six points of interest: the four intersection points, and the two endpoints of the line. For each point, we find its parameter in the parametric representation of the line, where a and b are the endpoints:

x(t) = a+t(b-a)

To find out the actual endpoints of the clipped line, we sort the six points according to their parameter, and take the two innermost points (with zero based indexing, those would be 2 and 3). What if the line is outside the box you say? We check that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).

Note that for a line parallel to one of the axes, a similar method applies, but without two of the intersection points. However, it is probably easier to just calculate the clipping directly in that case.

Here is some sample code, without all the surrounding checks (for parallelism with the axes, trivial clipping etc…).
The full version is available in the code of Ascii-Plotter.

```points_t = [0.0,1.0]   points_t.append( float(rect_bottom_left[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) ) points_t.append( float(rect_top_right[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) ) points_t.append( float(rect_bottom_left[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) ) points_t.append( float(rect_top_right[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )   points_t.sort() if points_t[2] < 0 or points_t[2] >= 1 or points_t[3] < 0 or points_t[2]>= 1: return None result = [(pt_1 + t*(pt_2-pt_1)) for t in (points_t[2],points_t[3]) for (pt_1, pt_2) in zip(line_pt_1, line_pt_2)] return (result[0],result[1]), (result[2], result[3])```
Categories

## Python module usage statistics – Cont.

Well, very embarrassingly for me, turns out I had a bug in my original post and code. As per Doug’s suggestion, I tried running the script I wrote on the standard library, and got results I didn’t quite believe. So I checked them, opened socket.py, and there was an “import _socket”. However module “_socket” was not listed in my results. After some digging around, (and feeling my face getting red hot…) I found the bug. It was the smallest thing: I forgot to add re.MULTILINE as an argument to re.search, so the start of my regexp, “^[ \t]*import” didn’t match except on the beginning of the file. #@&*!!! Just no words to express how I felt at that moment.

So, I fixed the bug, tested my code a bit, fixed some more (very minor) bugs, and here are the top 25 of the (hopefully correct this time!) results:

• sys,1426
• os,1250
• unittest,566
• time,446
• re,383
• string,321
• types,298
• setuptools,264
• pkg_resources,217
• cStringIO,184
• zope.interface,177
• datetime,173
• shutil,167
• os.path,162
• gtk,143
• StringIO,143
• random,136
• tempfile,132
• copy,131
• distutils.core,127
• doctest,126
• md5,125
• setuptools.command.e,116
• logging,116

Except the larger numbers, the arrangement pretty much stayed the same.

(This seems to at least confirm my claim that the results were representative)
Here are the results for the standard library (/usr/lib/python2.5/):

• sys,1309
• os,1065
• ctypes,588
• re,496
• string,493
• types,435
• time,374
• numpy,297
• warnings,254
• os.path,204
• cStringIO,196
• common,185
• math,159
• traceback,158
• gettext,152
• codecs,147
• StringIO,147
• copy,133
• __future__,128
• tempfile,126
• random,119
• unittest,108
• numpy.testing,105
• errno,100

These results seem different. ctypes seems to be the biggest change. Note that these results might be slanted, as some stdlib modules are implemented in C, and I didn’t test a ‘clean’ Python installation (but rather one with all my non-default modules installed).

Here is a link to the new results, the new stdlib results, and the new fixed graph. I created it using the first 325 module. The fixed R squared value is now approx. 0.99. Here is the code.

I would like to apologize to anyone who was misled by the original results, and again state that it is quite possible that there are still more bugs there, but I still claim the results to be represntative.

Categories

## Python module usage statistics

IMPORTANT UPDATE: The code I used to create these statistics had some bugs. The fixed statistics are available here.

After reading Doug Hellman’s post about python stdlib modules he needs the documentation to, I commented there that I need the documentation for logging because I don’t use it too frequently. Later, I thought a little bit more about it, and I wanted to check which modules are actually used often, and which rarely.

My first step was to “grep import *.py” on my python scripts directory. Later I progressed to writing a simple script which basically does the same thing, but also rates each imported module according to frequency (it looks for “import <modules>” and “from <module> import bla”). The results weren’t interesting enough. Before going any further, I sat down, and wrote my expected list of frequent stdlib modules:

• sys
• os
• re
• math
• urllib2
• time
• random
• struct
• socket
• itertools
• operator

I ran my statistics script, but non-stdlib modules were also listed. I fixed some bugs, improved the script a little, and (partially) disallowed inner project imports using a little heuristic. These are the improved results:

• sys,113
• os,107
• setuptools,57
• common,50
• unittest,50
• __future__,25
• distutils.core,24
• zope.interface,22
• re,19
• pygame,18
• time,13
• datetime,11
• string,10
• zope,10
• pyglet.gl,9
• types,8
• random,7
• pkg_resources,7
• gtk,6
• struct,6
• ez_setup,6
• zope.component,5
• math,5
• logging,5
• sqlalchemy,5

(Please note that I do not argue that these results are completely true, as my code might still contain problems, and it probably filters too much. However, I do claim that these results are representative).

Well, some surprises… I didn’t expect setuptools and types to be there. Especially types… who needs it anymore (and anyway)? Seeing unittest so high in the list gives me a pretty good feeling about opensource. __future__ being there is amusing… Seems that we just can’t wait for the next version. And indeed, logging is pretty low, but not as low as I thought it will be.

Another curiosity: the module frequencies seem (unsurprisingly) to obey Zipf’s law. Here is a small graph showing the results:

(plotted with Google Docs). The R squared value is 0.92. For the original results (where inner dependencies were allowed), it seemed that the law was even stronger.

For those interested in the full results, here is a csv file containing them.

One final note: while these results are true for projects in Python, they don’t represent usage in interactive mode. Since I use interactive mode a lot, and during interactive work I use os and re quite a lot, I bet the results would be changed if we somehow could count interactive mode usage.

Categories

## Stuff I folded while compiling…

Usually, I have tons of spare paper around. It comes with the job (student, programmer). I also like to fold. So when I’m thinking\resting\doing anything, I usually pick up a piece of paper, and start folding. This time, while doing compilation homework, I came up with two nice folds. Problem is, I don’t remember the way to fold them. If I want to find out, I’ll have to reverse engineer my own fold. That actually happens quite a lot to me. Maybe I’ll just open them and try to solve them as a crease pattern… It’s about time I learned how to do that.

Of course, anything I fold doesn’t really look as good as anything by ‘the masters’. Still it’s fun to fold.

Here are the results, a rhino, and something that looks remotely like the road-runner:

Categories

## Simple word lookup for crosswords and such…

I am swamped with university homework. I did manage to come up with some small somewhat useful utility.

Do you know when you need a word that ends with a double letter and then an ‘n’? you would want to write it like this: “xxN\$”. Well, I wrote a simple tool that translates a ‘word lookup syntax’ into regular expressions, and then does a lookup for the word.

This raises a two questions:

What exactly is the syntax?
Small letters stand for any letter, and big letter stand for their specific letters. So for example, “bEEn” would match “been”, “seen” and “keep”, but not “boon”. Punctuation is left unchanged, so you can use \$,^, (?!xy) and so on. I didn’t work too too much on it, so [] doesn’t translate well (the letters inside it will come out garbled).

What is the data for the lookup?
Well, I used the excellent natural language toolkit (obviously in Python). Very easy to start using it, with a nice tutorial. It’s nice knowing that such a tool is available. The source data is actually an extract from the Gutenberg project. The result words are sorted according to frequency in the source language, and the 20 most frequent words are returned.

Without further ado, here are some code snippets. (The compiled regexp is printed. Note how ugly (yet simple) it is :)
(Note: I did cut out some of the output…)

```In [1]: fd = words.create_word_index()   In [2]: words.find_words("^bEEn\$",fd) ^(?P<b>.)EE(?!((?P=b))|(E))(?P<n>.)\$ Out[2]: ['BEEN', 'SEEN', 'KEEP', 'FEET', 'FEEL', 'SEEK', 'SEED', 'MEET', 'DEEP', 'NEED', 'SEEM', ...]   In [3]: words.find_words("^jvfpf\$",fd) ^(?P<j>.)(?!((?P=j)))(?P<v>.)(?!((?P=j))|((?P=v)))(?P<f>.)(?!((?P=j))|((?P=f))|((?P=v)))(?P<p>.)(?P=f)\$ Out[3]: ['THERE', 'THESE', 'WHERE', 'JESUS', 'MOSES', 'HOSTS', 'LINEN', 'PIECE', 'SCENE', 'WHOSO', 'POSTS', 'EPHAH', ...]```

Note that if you find some of the result words a bit too uncommon for your taste (e.g. Ephah), I would suggest removing the bible from the source data… :)

The (very ugly, uncommented, draft code) is available here. I apologize for its unseemliness, but I figured that it’s better to make it available in its current form, rather than never put it online for lack of time and will. It could also be seen that the code isn’t terribly efficient. It is however, ‘good enough’ for interactive work.

Categories