Archive for the ‘Design’ Category

Browser visibility-security and invisibility-insecurity

Friday, August 3rd, 2007

Formal languages have a knack of giving some output, and then later doing something completely different. For example, take the “Halting Problem“, but this is probably too theoretical to be of any relevance… so read on for something a bit more practical. We are going to go down the rabbit hole, to the ‘in-between’ space…

My interest was first piqued when I encountered the following annoyance – some websites would use transparent layers to prevent you from:

  1. Marking and copying text.
  2. Left-clicking on anything, including:
    1. images, to save them,
    2. just the website, to view its source -
  3. and so on and so forth…

Now I bet most intelligent readers would know how to pass these minor hurdles – but mostly just taking the steps is usually deterrent enough to prevent the next lazy guy from doing anything. So I was thinking, why not write a browser – or just a Firefox plugin, that will allow us to view just the top-level of any website?

This should be easy enough to do, but if it bothered enough sites (which it probably won’t), and they fought back, there would be a pretty standard escalation war. However, since the issue is not that major, I suspect it wouldn’t matter much.

Now comes the more interesting part. Unlike preventing someone from copying text, html (plus any ’sub-languages’ it may use) may be used to display one thing, and to be read like a different thing altogether. The most common example is with spam – displaying image spam instead of text. When that was countered by spam filters, animated gif files were used. Now you have it – your escalation war, par excellence. This property of html was also used by honeypots to filter comment-spam, as described in securiteam. In this the securiteam blog post by Aviram, the beginning of another escalation war is described. There are many more examples of this property of html.

All of these examples come from html’s basic ability to specify what do display, and being able to seem to display completely different things. There are actually two parsers at work here – one is the ‘filter’ – its goal is to filter out some ‘bad’ html, and the other is a bit more complicated – it is the person reading the browser’s output (it may be considered to be the ‘browser + person’ parser) . These two parsers operate on completely different levels of html. Now, I would like to point out that having to parsers reading the same language is a common insecurity pattern. HTML has a huge space between what is expressible, and what is visible. In that space – danger lies.

As another, simpler example, consider phishing sites. These are common enough nowadays. How does your browser decide if the site you are looking at is actually a phishing site? Among other things – reading the code behind the site. However, this code can point to something completely different then what is being displayed. In this ‘invisible’ space – any misleading code can live. In that way, the spammer may pretend to be a legitimate site for the filter, but your run-of-the-mill phishing site for the human viewer. This misleading code in the ‘invisible space’ may be used to good – like a honeypot against some comment-spammer, or it may be used for different purposes – by the spammer himself.

Now comes the interesting part. The “what to do part”. For now let me just describe it theoretically, and later work on its practicality. I suggest using a ‘visibility browser’. This browser will use some popular browser (Internet Explorer, Firefox, Safari, Opera, etc.. ) as its lower level. This lower level browser will render the website to some buffer, instead of the screen. Now, our ‘visibility browser’ will OCR all of the visible rendered data, and restructure it as valid HTML. This ‘purified’ html may now be used to filter any ‘bad’ sites – whichever criterion you would like to use for ‘bad’.

I know, I know, this is not practical, it is computationally intensive etc etc… However, it does present a method to close down that nagging ’space’, this place between readability and visibility, where bad code lies. I also know that the ‘visible browser’ itself may be targeted, and probably quite easily. Those attacks will have to rely on implementation faults of the software, or some other flaw, as yet un-thought-of. We all know there will always be bugs. But it seems to me that the ‘visibility browser’ does close, or at least cover for a time, one nagging design flaw.

Exception handling policy – use module exception hierarchies

Wednesday, July 18th, 2007

While programming some bigger projects, and not some home-brew script, I used to wonder what to do with exceptions coming from lower level and library modules. The ‘home script’ approach to exceptions is “let it rise” – usually because it indicates an error anyway, and the script needs to shut-down. If there is any cleanup to be done, it will happen in the __del__ functions and finally clause as required.

After getting some experience with handling problems and writing applications that continue to work despite exceptions, I’ve come to the conclusion that the best practice approach is that each of the project’s module must raise its own type of exception. That way, in higher level modules that catch the exception you can set a concrete policy about ‘what to do’.

An example: I have some low level module named “worker”. Let’s say this module’s job is to copy files around efficiently. The higher level module, “manager” decides on ‘policy’ about what to do with files (copy them, delete them, etc…). Now, the “worker” module raised an exception. If this exception is an IndexError, the higher level module can’t really decide – was this because of a programming error, or an OS error? Even if “worker” did define an interface that allowed for throwing IndexError’s in some cases, if there is a real programming error in the module, it will be missed. The correct way to go about it is to create an exception hierarchy for each module, which will be the ‘communication channel’ for errors. This way, any unexpected programming error will percolate up as such, and not get missed – and any exception handling mechanism on the way can decide what do to about it (log, die, ignore, etc…).

Every exception handling rule has exceptions: when writing ‘low-level’ library modules yourself, such as a dict-like class, it makes sense to use IndexError. Don’t get confused with other library modules though – If you are writing some communication library, it should follow the same rules described above for a module.