Programming Philosophy

Beautiful Code

A few days ago, @edensh mentioned in Facebook beautiful code, and many people gave examples of assembly, while I was thinking of Python.

That got me thinking: what is beautiful code for me?

So here are my criteria for beautiful code:

  1. Readable (also visually pretty)
  2. Concise
  3. Does something non trivial (usually in an unexpectedly short manner)
  4. Good (solves the problem, efficiently)

If we consider code to be an implementation of a solution to a problem, than 3 & 4 usually apply to the solution, while 1 & 2 apply to the code itself. This brings me to why I like Python:

  1. Code is more readable. Specifically, I can still still easily understand code I wrote years ago. Also, Python’s zen encourages you to write readable code. For example “explicit is better than implicit” directly applies to readability.
  2. Python is visually appealing, although I guess that’s a matter of opinion :)
  3. Python almost always allows me to express my solutions easily & succinctly, whereas with other languages (C, C++, Java) I have to fight to “get my point across”.
  4. Python almost always has the right data structures to implement my solutions efficiently.

With that in mind, it’s clear to me now how assembly code can be beautiful.

Note that I didn’t mention C#, Ruby or Haskell. I don’t have much experience with these languages, but from what I’ve seen so far, it seems to me that these languages may help you write beautiful code. Of these, Haskell is probably going to be the first language I’ll learn – I think it will be the most educating experience, although I’m pretty sure others will argue with me regarding Haskell’s readability :)

Now, My question to you is: what do you think makes code beautiful?

Programming Programming Philosophy

Ethics in Programming

Some time ago I was bothered by the issue of ethics in programming.
I heard the question best raised during a “game unconference” I attended. There was a panel about monetary systems for games, and people talked about the issues faced when adding money to an online game.
At one point someone from the audience said about ingame monetary systems (such as in WoW) “it’s like gambling and drugs!”, to which one panelist jokingly replied “so we have a proven business model”, and another said “except it’s legal”.

This was all in good spirit, but it got me thinking:

What are the programming jobs I will not take?

Programming Philosophy Security web-design

Breaking Rapidshare's Annoying Captcha the Easy Way

Like many others, I got stuck in front of Rapidshare’s captcha. After more than five attempts at reading different letters with kittens and other critters hidden behind them, I was thinking of giving up. Especially because each time I failed I had to wait a half a minute again. However, in one instance I went *back* via my browser, and tried solving the same captcha again. Turns out this works, and I got the file.

I know I could probably have solved it in a smarter fashion, but it wasn’t worth the effort.

My lesson:

When someone writes crappy software, their software is probably crappy in more than one way.

This is not the first time I’ve seen this happen.

Challenges computer science Programming Philosophy Python

A classic programming challenge, in Python

It has become a tradition for computer scientists to create various self referential ‘strange loops’. Traditions such as writing a compiler in the language it compiles are actually quite useful – and also very interesting. This tradition also branched to another one (also mentioned in the linked article) of writing programs that output their own source (without disk access and other dirty tricks).

The challenge is obviously to write such a program in Python, in as few lines as possible. Here is my solution, which is at two lines. I urge you to try it for yourself before looking, it is a very educating challenge. I’ll be very much interested in seeing a one-liner for this problem, or a proof that such a one-liner does not exist.

If you are interested in the bigger challenge, of writing an interpreter for Python in Python itself, go check out PyPy first.

For those interested in other ‘strange loops’, find a copy of ‘Godel Escher Bach’. If you happen to live in Israel, and can come to Haifa, I might even lend you my copy (once I get it back :)

Programming Programming Philosophy Teaching Programming

Zen Programming – 2

I thought about this a long time ago with Erez:

To fully grasp structured control flow – you must first learn to program without it.

Saw it happen, with myself and with others. Only after writing some hand-written conditionals and loops in assembly, do you really learn what do they mean, and how to make the code do what you want it do.

computer science Design Programming Programming Philosophy Security

Browser visibility-security and invisibility-insecurity

Formal languages have a knack of giving some output, and then later doing something completely different. For example, take the “Halting Problem“, but this is probably too theoretical to be of any relevance… so read on for something a bit more practical. We are going to go down the rabbit hole, to the ‘in-between’ space…

My interest was first piqued when I encountered the following annoyance – some websites would use transparent layers to prevent you from:

  1. Marking and copying text.
  2. Left-clicking on anything, including:
    1. images, to save them,
    2. just the website, to view its source –
  3. and so on and so forth…

Now I bet most intelligent readers would know how to pass these minor hurdles – but mostly just taking the steps is usually deterrent enough to prevent the next lazy guy from doing anything. So I was thinking, why not write a browser – or just a Firefox plugin, that will allow us to view just the top-level of any website?

This should be easy enough to do, but if it bothered enough sites (which it probably won’t), and they fought back, there would be a pretty standard escalation war. However, since the issue is not that major, I suspect it wouldn’t matter much.

Now comes the more interesting part. Unlike preventing someone from copying text, html (plus any ‘sub-languages’ it may use) may be used to display one thing, and to be read like a different thing altogether. The most common example is with spam – displaying image spam instead of text. When that was countered by spam filters, animated gif files were used. Now you have it – your escalation war, par excellence. This property of html was also used by honeypots to filter comment-spam, as described in securiteam. In this the securiteam blog post by Aviram, the beginning of another escalation war is described. There are many more examples of this property of html.

All of these examples come from html’s basic ability to specify what do display, and being able to seem to display completely different things. There are actually two parsers at work here – one is the ‘filter’ – its goal is to filter out some ‘bad’ html, and the other is a bit more complicated – it is the person reading the browser’s output (it may be considered to be the ‘browser + person’ parser) . These two parsers operate on completely different levels of html. Now, I would like to point out that having to parsers reading the same language is a common insecurity pattern. HTML has a huge space between what is expressible, and what is visible. In that space – danger lies.

As another, simpler example, consider phishing sites. These are common enough nowadays. How does your browser decide if the site you are looking at is actually a phishing site? Among other things – reading the code behind the site. However, this code can point to something completely different then what is being displayed. In this ‘invisible’ space – any misleading code can live. In that way, the spammer may pretend to be a legitimate site for the filter, but your run-of-the-mill phishing site for the human viewer. This misleading code in the ‘invisible space’ may be used to good – like a honeypot against some comment-spammer, or it may be used for different purposes – by the spammer himself.

Now comes the interesting part. The “what to do part”. For now let me just describe it theoretically, and later work on its practicality. I suggest using a ‘visibility browser’. This browser will use some popular browser (Internet Explorer, Firefox, Safari, Opera, etc.. ) as its lower level. This lower level browser will render the website to some buffer, instead of the screen. Now, our ‘visibility browser’ will OCR all of the visible rendered data, and restructure it as valid HTML. This ‘purified’ html may now be used to filter any ‘bad’ sites – whichever criterion you would like to use for ‘bad’.

I know, I know, this is not practical, it is computationally intensive etc etc… However, it does present a method to close down that nagging ‘space’, this place between readability and visibility, where bad code lies. I also know that the ‘visible browser’ itself may be targeted, and probably quite easily. Those attacks will have to rely on implementation faults of the software, or some other flaw, as yet un-thought-of. We all know there will always be bugs. But it seems to me that the ‘visibility browser’ does close, or at least cover for a time, one nagging design flaw.

C Programming Programming Philosophy Python Teaching Programming

"Fnord" or "The evil empire cheerfuly striked back at the merry-colored pretty princess"

One of my favorite exercises for young programmers is the following (widely known program):

Write a program that will read lists of words from the following files:

  • verbs.txt – contains verb, the others are respectively –
  • nouns.txt,
  • adverbs.txt,
  • and adjectives.txt.

After reading the words, the program will construct sentences of the form:

The <adj> <noun> <adv> <verb> the <adj> <adj> <noun>.

You could of course allow for different sentence structures, or more types of words (prepositions). This site made me remember that. This exercise is really handy, and it also allows for great variations of difficulty (at least in C). If you tell the programmer that he or she must support variable number of words, or just a maximum, or how to define the files… the options are endless. The interesting thing about this exercise, is that in C, it is much more effort to write then in Python (with all the built-in modules). This makes the exercise even more effective: After letting the new programmer painfully implement it in C, I let the guy (or girl) implement it in Python. Really fun to watch. Especially after he or she discovers the meaning of ‘one-liner’ :)

Algorithms Programming Philosophy Python

Proofreading and what's wrong with the heapq module

Consider the following problem:

you have a large book you would like to proofread, with many chapters (100+) and a few men (4) at your disposal. How would you distribute the chapters among the men, considering that each proofreader must get whole chapters?

Well, the obvious solution is just divide the number of chapters by the number of men, and give each proofreader an appropriate number of (randomly picked?) chapters. But what if the chapters or of varying length? Well, you could just distribute them randomly, but that just doesn’t feel right now does it?

I was asked by a friend to help him write a script to solve this, and quite quickly I came up with the solution:

Sort the chapters according to length (number of words), in descending order. Keep the proofreaders in a sorted order. While there are still chapters left – get the next longest chapter, and give it to the proofreader with the least total length. Rearrange the proofreaders.

This algorithm is a greedy algorithm and is also regarded as the straightforward way to solve this problem.

Well, this seems simple enough – In the case where there are more then a few proofreaders – well, the obvious solution is to use a minimum heap. Using a minimum heap in python should be easy – just import the heapq module, use heapify and various other functions, just as you would use sort, and you are home free.

Not quite…

Turns out, that using the heapq module isn’t so easy as it seems, or at least not as quick and simple as I would like it to be. While you can sort list using your own compare function, and also providing a key-function, the heapq module sorts items using Python’s standard operators. This means, that to use heapq properly to sort an object according to some property, you have to write an object wrapper that has __lt__ and similar functions defined. All this coding could be saved, if heapq had a key argument that could be used, or any other method to control the comparisons.

And what about the proofreading you ask? Well, it worked. It divided the chapters quite evenly, although we did end up using sort() repeatedly, as the number of proofreaders was small and we did not want to overly complicate matters.

This again emphasizes a something I’ve come to discover – being able to write something quickly is usually more important then later being able to run it quickly. If you write something in an environment that allows for quick coding – later if speed is required you’ll be able to optimize it quickly, or even change the design or the basic algorithm.

Programming Philosophy

Zen Programming – 1

To achieve correctness of code, you must first realize that your code is incorrect.

Too many times I’ve reviewed code that was written with only the obvious cases in mind. A small step towards writing better code is to think to yourself during (and even before!) coding: “what if?”.