Categories
Compilation Math Programming Python

Interesting links – 4

http://ivory.idyll.org/articles/advanced-swc/

“Intermediate and Advanced Software Carpentry in Python” is an excellent reading by Titus Brown. If you feel you’re good with Python but want to improve it, or if you are an experienced programmer that wants to get better, this is a good place to go. I liked it.

http://c2.com/cgi/wiki?AlternateHardAndSoftLayers

During the time I was working on my Compilation course, I was thinking about the challenge of writing yacc in yacc. Well, I went searching for “yacc in yacc” and stumbled across this page. It is a part of a very strange ‘pattern wiki’. It has fascinating discussions of the subjects in it, but I don’t like the old-style wiki navigation. Still worth a taste.

http://www.scottaaronson.com/writings/bignumbers.html

Do you know the kind of duel where two people need to name the biggest number? If you thought something along the lines of ‘hey, I’ll write something with a lot of nines’, well, you are in for a surprise. This article has a nice solution for the problem. Excellent read, especially if you are into computation theory.  I really liked that one.

Categories
Compilation computer science Programming Projects Python

Manually fuzzing my own compiler

As I mentioned before, I had to write a compiler for simplified CPL. An obvious requirement was that the compiler generate correct code. A less obvious requirement, but important none-the-less, was that after a syntax error, the compiler will keep parsing the source program.

Now, the default behavior of a parser generated by Bison for a synatx error is to return from the parsing function, yyparse. You may of-course call yyparse again, but this would be mostly meaningless – you lost all your context. A simple example would be a language that has to have ‘program’ as the first word. Once you are past that token, you will not be able to parse the source-program again, because your first production (which might look like this):

program: TOK_PROGRAM declarations statements TOK_END

won’t parse.

This is solved in Bison by adding error productions. For example:

expression: '(' error ')'

This production means that an error encountered within parenthesis may be considered a valid expression for the purposes of parsing. A proper semantic action for that error (the code that runs when the production is parsed) will add an error message to some error list, and maybe do some general book-keeping.

So where does the fuzzing come in?
Well, my compiler was mostly working, but it still had no error recovery. That means that any syntax error would cause it to exit, with just that one error. Consider your favorite compiler, (such as gcc), exiting on the first missing semicolon. This is just not acceptable. So I added my basic error recovery and was pretty satisfied.
Then, I had to test the newly written error handling. So I wrote a CPL program in which my goal was to try and kill my own compiler. Thats a fun way to test your code. This also happens to be a ‘fuzzing mindset’. I actually managed to find some holes I didn’t think about, and closed them. Of course, these were not security holes, just ‘compilation holes’.
Here is an excerpt from one of the programs I wrote:
[c]
x=x++++++++++++++100;
x=x=3;
}
for (x=1; x<100; x=x+1; { x = x; } else () { for (x=1; x<100; x=x+1;) { x = -1; } } [/c] It goes on like this for a bit, so I'll save you the trouble. Later I did some deep testing for the code generation. I wrote a test-case for every possible operator in the language (there aren't that many) and for each type (real and int. Did I mention it was simplified cpl?). Since the language doesn’t support strings, each of these test cases printed 1 for success and 0 for failure. I ran the compiled output with my VM script and then had a simple Python script collect all the results and return failure if any of them failed. I later also tested control flow structures using the same method.
I had all of these tests available in my makefile, and after each change all I had to do was ‘make test’. Which I did, often.

Looking back after finishing the assignment, it occurred to me that I could have also written an actual ‘fuzzer’ for source programs. However, in the context of such a university assignment this seems like overkill. Had I been writing a real compiler, there’s a very good chance I’d have written a ‘source fuzzer’.

All in all, after I got over the nastiness which is Bison, it was pretty fun. It has been a long time since I wrote C or C++, and it’s good to clean some of the rust away from time to time.

Categories
Assembly Compilation computer science Personal Programming Python

Writing a Quad Interpreter

My compilation homework is writing a compiler from simplified CPL to Quad, a made up assembly language. The compiler was going well, and when it started to emit code, I naturally wanted to test it.

The problem is, I work on Ubuntu Linux, and the only available Quad interpreter was for Windows. Trying to run it with Wine didn’t work, and I gave up on it pretty quickly. I also didn’t want to start working on Windows. So I sat down for half an hour and wrote a Quad interpreter of my own, in Python.

Afterwards, I thought of publishing it, and sat down to write a step-by-step mode as well. I didn’t implement breakpoints, although it is quite easy.

The compiler is working now, although I do have to do the finish. Since it is a homework assignment, I will not be publishing it, but I will publish the Quad Interpreter. I hope it benefits you.

A note of caution though: I didn’t test it thoroughly . The most I did was running it with some toy programs and the output of my compiler to see if it works. It does, and that’s enough for me.