Categories
Humour Math Personal

Fibonacci meals

Lately I’ve been reminded of the existence of Fibonacci meals. For a long time, I despised those, but now that I’m preparing my own food most of the time, I’m aiming for them, to save time cooking.

A Fibonacci meal is an element of a sequence defined to be:

a(0) = some food
a(1) = some food
a(n) = a(n-1) + (a(n-2) as sauce)

Very simple, and very effective. You can find those in various places, such as camps, school cafeteria, etc…

Generally this sequence does not converge, however there is a special case for cakes and desserts:

c(0) = some cake or dessert
c(1) = some cake or dessert
c(n) = c(n-1) + (c(n-2) as icing)

This sequence does converge, to the chocolate ball. (in Wikipedia, Chokladboll).

Categories
Origami

I fold

Just some general stuff I folded while working. A pig, a sphinx, and a six legged thing.

The question still remains, what’s easier, the sphinx or the pig. Both are really simple folds. I reckon the sphinx is easier. However, the six legged thing, while complicated, doesn’t resemble anything I know, and I couldn’t figure out how to make it look like something. This just goes to show, yet again, that less is more.

Categories
Algorithms Geometry Graphics Math Programming Python

Elegant Line Clipping Algorithm

I actually wrote about that a long time ago, in the wiki that once was algorithm.co.il. Since the wiki is long gone, and I figured this is a very elegant algorithm, I decided I should write about it.

First, some background – while writing Ascii-Plotter, I had to plot lines. To plot lines correctly I had to clip them. Now since this was a fun project, I decided I wanted to figure out how to do that myself, even though the problem is really solved. So I sat down with pen and paper, and did a little thinking, and I came up this nice algorithm.

The idea is simply this: consider a line, that is not parallel to any of the axes. This line intersects all the four lines that make up the limits of our screen. Here is a diagram:

An intersection of a line with a bounding box

Now notice that there are six points of interest: the four intersection points, and the two endpoints of the line. For each point, we find its parameter in the parametric representation of the line, where a and b are the endpoints:

x(t) = a+t(b-a)

To find out the actual endpoints of the clipped line, we sort the six points according to their parameter, and take the two innermost points (with zero based indexing, those would be 2 and 3). What if the line is outside the box you say? We check that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).

Note that for a line parallel to one of the axes, a similar method applies, but without two of the intersection points. However, it is probably easier to just calculate the clipping directly in that case.

Here is some sample code, without all the surrounding checks (for parallelism with the axes, trivial clipping etc…).
The full version is available in the code of Ascii-Plotter.

points_t = [0.0,1.0]
 
points_t.append( float(rect_bottom_left[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_top_right[0]-line_pt_1[0])/(line_pt_2[0]-line_pt_1[0]) )
points_t.append( float(rect_bottom_left[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )
points_t.append( float(rect_top_right[1]-line_pt_1[1])/(line_pt_2[1]-line_pt_1[1]) )
 
points_t.sort()
if points_t[2] < 0 or points_t[2] >= 1 or points_t[3] < 0 or points_t[2]>= 1:
	return None
result = [(pt_1 + t*(pt_2-pt_1)) for t in (points_t[2],points_t[3]) for (pt_1, pt_2) in zip(line_pt_1, line_pt_2)]
return (result[0],result[1]), (result[2], result[3])
Categories
Math Programming Python

Python module usage statistics – Cont.

Well, very embarrassingly for me, turns out I had a bug in my original post and code. As per Doug’s suggestion, I tried running the script I wrote on the standard library, and got results I didn’t quite believe. So I checked them, opened socket.py, and there was an “import _socket”. However module “_socket” was not listed in my results. After some digging around, (and feeling my face getting red hot…) I found the bug. It was the smallest thing: I forgot to add re.MULTILINE as an argument to re.search, so the start of my regexp, “^[ \t]*import” didn’t match except on the beginning of the file. #@&*!!! Just no words to express how I felt at that moment.

So, I fixed the bug, tested my code a bit, fixed some more (very minor) bugs, and here are the top 25 of the (hopefully correct this time!) results:

  • sys,1426
  • os,1250
  • unittest,566
  • time,446
  • re,383
  • string,321
  • types,298
  • setuptools,264
  • pkg_resources,217
  • cStringIO,184
  • zope.interface,177
  • datetime,173
  • shutil,167
  • os.path,162
  • gtk,143
  • StringIO,143
  • random,136
  • tempfile,132
  • copy,131
  • threading,128
  • distutils.core,127
  • doctest,126
  • md5,125
  • setuptools.command.e,116
  • logging,116

Except the larger numbers, the arrangement pretty much stayed the same.

(This seems to at least confirm my claim that the results were representative)
Here are the results for the standard library (/usr/lib/python2.5/):

  • sys,1309
  • os,1065
  • ctypes,588
  • re,496
  • string,493
  • types,435
  • time,374
  • numpy,297
  • warnings,254
  • os.path,204
  • cStringIO,196
  • common,185
  • math,159
  • traceback,158
  • gettext,152
  • codecs,147
  • StringIO,147
  • copy,133
  • __future__,128
  • tempfile,126
  • random,119
  • threading,117
  • unittest,108
  • numpy.testing,105
  • errno,100

These results seem different. ctypes seems to be the biggest change. Note that these results might be slanted, as some stdlib modules are implemented in C, and I didn’t test a ‘clean’ Python installation (but rather one with all my non-default modules installed).

Here is a link to the new results, the new stdlib results, and the new fixed graph. I created it using the first 325 module. The fixed R squared value is now approx. 0.99. Here is the code.

I would like to apologize to anyone who was misled by the original results, and again state that it is quite possible that there are still more bugs there, but I still claim the results to be represntative.

Categories
Math Programming Python

Python module usage statistics

IMPORTANT UPDATE: The code I used to create these statistics had some bugs. The fixed statistics are available here.

After reading Doug Hellman’s post about python stdlib modules he needs the documentation to, I commented there that I need the documentation for logging because I don’t use it too frequently. Later, I thought a little bit more about it, and I wanted to check which modules are actually used often, and which rarely.

My first step was to “grep import *.py” on my python scripts directory. Later I progressed to writing a simple script which basically does the same thing, but also rates each imported module according to frequency (it looks for “import <modules>” and “from <module> import bla”). The results weren’t interesting enough. Before going any further, I sat down, and wrote my expected list of frequent stdlib modules:

  • sys
  • os
  • re
  • math
  • thread
  • urllib2
  • time
  • random
  • struct
  • socket
  • itertools
  • operator

My next step was to try Google Codesearch. Some time ago I wrote a little script to harvest results from Google Codesearch, but enough time has passed, and the script doesn’t work anymore. Parsing their output is messy enough, and their results don’t seem that helpful anyway. (They don’t return all the results in a project). So I thought about using koders.com, when I had a flash. I can use Cheeseshop! easy_install must have a way to just download the source of a project! (For those who don’t know: you can do easy_install SQLObject, and easy_install will look for the module in the PyPI, download it and then install it for you). Indeed, easy_install had this option (-eb), and I was in luck. Quickly, I got a list of all the modules in PyPI, shuffled it, and picked 300 randomly. (Actually, downloading them wasn’t that quick, and about 20 didn’t download right :).

I ran my statistics script, but non-stdlib modules were also listed. I fixed some bugs, improved the script a little, and (partially) disallowed inner project imports using a little heuristic. These are the improved results:

  • sys,113
  • os,107
  • setuptools,57
  • common,50
  • unittest,50
  • __future__,25
  • distutils.core,24
  • zope.interface,22
  • re,19
  • pygame,18
  • time,13
  • datetime,11
  • string,10
  • zope,10
  • pyglet.gl,9
  • types,8
  • random,7
  • pkg_resources,7
  • gtk,6
  • struct,6
  • ez_setup,6
  • zope.component,5
  • math,5
  • logging,5
  • sqlalchemy,5

(Please note that I do not argue that these results are completely true, as my code might still contain problems, and it probably filters too much. However, I do claim that these results are representative).

Well, some surprises… I didn’t expect setuptools and types to be there. Especially types… who needs it anymore (and anyway)? Seeing unittest so high in the list gives me a pretty good feeling about opensource. __future__ being there is amusing… Seems that we just can’t wait for the next version. And indeed, logging is pretty low, but not as low as I thought it will be.

Another curiosity: the module frequencies seem (unsurprisingly) to obey Zipf’s law. Here is a small graph showing the results: module imports on a log-log graph

(plotted with Google Docs). The R squared value is 0.92. For the original results (where inner dependencies were allowed), it seemed that the law was even stronger.

For those interested in the full results, here is a csv file containing them.

One final note: while these results are true for projects in Python, they don’t represent usage in interactive mode. Since I use interactive mode a lot, and during interactive work I use os and re quite a lot, I bet the results would be changed if we somehow could count interactive mode usage.


Categories
Origami

Stuff I folded while compiling…

Usually, I have tons of spare paper around. It comes with the job (student, programmer). I also like to fold. So when I’m thinking\resting\doing anything, I usually pick up a piece of paper, and start folding. This time, while doing compilation homework, I came up with two nice folds. Problem is, I don’t remember the way to fold them. If I want to find out, I’ll have to reverse engineer my own fold. That actually happens quite a lot to me. Maybe I’ll just open them and try to solve them as a crease pattern… It’s about time I learned how to do that.

Of course, anything I fold doesn’t really look as good as anything by ‘the masters’. Still it’s fun to fold.

Here are the results, a rhino, and something that looks remotely like the road-runner:

An origami rhinoAn origami road runner

Categories
Programming Python Utility Functions

Simple word lookup for crosswords and such…

I am swamped with university homework. I did manage to come up with some small somewhat useful utility.

Do you know when you need a word that ends with a double letter and then an ‘n’? you would want to write it like this: “xxN$”. Well, I wrote a simple tool that translates a ‘word lookup syntax’ into regular expressions, and then does a lookup for the word.

This raises a two questions:

What exactly is the syntax?
Small letters stand for any letter, and big letter stand for their specific letters. So for example, “bEEn” would match “been”, “seen” and “keep”, but not “boon”. Punctuation is left unchanged, so you can use $,^, (?!xy) and so on. I didn’t work too too much on it, so [] doesn’t translate well (the letters inside it will come out garbled).

What is the data for the lookup?
Well, I used the excellent natural language toolkit (obviously in Python). Very easy to start using it, with a nice tutorial. It’s nice knowing that such a tool is available. The source data is actually an extract from the Gutenberg project. The result words are sorted according to frequency in the source language, and the 20 most frequent words are returned.

Without further ado, here are some code snippets. (The compiled regexp is printed. Note how ugly (yet simple) it is :)
(Note: I did cut out some of the output…)

In [1]: fd = words.create_word_index()
 
In [2]: words.find_words("^bEEn$",fd)
^(?P<b>.)EE(?!((?P=b))|(E))(?P<n>.)$
Out[2]:
['BEEN',
 'SEEN',
 'KEEP',
 'FEET',
 'FEEL',
 'SEEK',
 'SEED',
 'MEET',
 'DEEP',
 'NEED',
 'SEEM',
 ...]
 
In [3]: words.find_words("^jvfpf$",fd)
^(?P<j>.)(?!((?P=j)))(?P<v>.)(?!((?P=j))|((?P=v)))(?P<f>.)(?!((?P=j))|((?P=f))|((?P=v)))(?P<p>.)(?P=f)$
Out[3]:
['THERE',
 'THESE',
 'WHERE',
 'JESUS',
 'MOSES',
 'HOSTS',
 'LINEN',
 'PIECE',
 'SCENE',
 'WHOSO',
 'POSTS',
 'EPHAH',
 ...]

Note that if you find some of the result words a bit too uncommon for your taste (e.g. Ephah), I would suggest removing the bible from the source data… :)

The (very ugly, uncommented, draft code) is available here. I apologize for its unseemliness, but I figured that it’s better to make it available in its current form, rather than never put it online for lack of time and will. It could also be seen that the code isn’t terribly efficient. It is however, ‘good enough’ for interactive work.

Categories
Uncategorized

Facebook and Privacy

So I’ve got a profile on Facebook. A friend invited me there.  So what is there to say about Facebook that hasn’t been said already?

Well, I’ll start with what people have probably already said. It is very much a ‘platform’ as well as a social network. The social network is just the basis for all those little application that exist there. For those who don’t know – Facebook is filled with applications, most of them useless fun, some of them really interesting. Each application uses the details available in your profile and in your friends profile, and adds some small UI to your profile page. One interesting application graphs the connections between your friends. Another application ‘matches’ your friends according to some heuristic. All harmless fun.

Well, one thing I did notice on Facebook is that almost all users have a real picture of themselves. That’s quite new for me. Granted, the ability to put your profile picture has been available for quite some time now on many networks (including the ICQ\Skype variety) , but I find it interesting that on Facebook most profile pictures are real. This also made me think about how people connect on the internet. Let’s take a second to review the progress shall we?

IRC\Usenet\Email -> ICQ and other IM clients -> other social networks, Facebook, blogs and comments

What is clearly noticeable for me, is that the amount of detail people supply in each medium increased. This is notable in itself. Also note the possibility of ads given a service such as Facebook. Up until now, Google ads could have used your search history, as well as your current search, and your Gmail account to target ads. Now, with the data supplied directly by the users, Facebook based ads have a potential to be much more targeted. The beauty of it all is that no underhanded techniques were used. The information is given away freely. I wonder what the next step will be.

And what about me? well, I didn’t put a real profile picture. As far as I know, today there is only one picture of me that is available via a Google search on my name. I’d like it to stay that way for now.