Compilation computer science Programming Projects Python

Manually fuzzing my own compiler

As I mentioned before, I had to write a compiler for simplified CPL. An obvious requirement was that the compiler generate correct code. A less obvious requirement, but important none-the-less, was that after a syntax error, the compiler will keep parsing the source program.

Now, the default behavior of a parser generated by Bison for a synatx error is to return from the parsing function, yyparse. You may of-course call yyparse again, but this would be mostly meaningless – you lost all your context. A simple example would be a language that has to have ‘program’ as the first word. Once you are past that token, you will not be able to parse the source-program again, because your first production (which might look like this):

program: TOK_PROGRAM declarations statements TOK_END

won’t parse.

This is solved in Bison by adding error productions. For example:

expression: '(' error ')'

This production means that an error encountered within parenthesis may be considered a valid expression for the purposes of parsing. A proper semantic action for that error (the code that runs when the production is parsed) will add an error message to some error list, and maybe do some general book-keeping.

So where does the fuzzing come in?
Well, my compiler was mostly working, but it still had no error recovery. That means that any syntax error would cause it to exit, with just that one error. Consider your favorite compiler, (such as gcc), exiting on the first missing semicolon. This is just not acceptable. So I added my basic error recovery and was pretty satisfied.
Then, I had to test the newly written error handling. So I wrote a CPL program in which my goal was to try and kill my own compiler. Thats a fun way to test your code. This also happens to be a ‘fuzzing mindset’. I actually managed to find some holes I didn’t think about, and closed them. Of course, these were not security holes, just ‘compilation holes’.
Here is an excerpt from one of the programs I wrote:
for (x=1; x<100; x=x+1; { x = x; } else () { for (x=1; x<100; x=x+1;) { x = -1; } } [/c] It goes on like this for a bit, so I'll save you the trouble. Later I did some deep testing for the code generation. I wrote a test-case for every possible operator in the language (there aren't that many) and for each type (real and int. Did I mention it was simplified cpl?). Since the language doesn’t support strings, each of these test cases printed 1 for success and 0 for failure. I ran the compiled output with my VM script and then had a simple Python script collect all the results and return failure if any of them failed. I later also tested control flow structures using the same method.
I had all of these tests available in my makefile, and after each change all I had to do was ‘make test’. Which I did, often.

Looking back after finishing the assignment, it occurred to me that I could have also written an actual ‘fuzzer’ for source programs. However, in the context of such a university assignment this seems like overkill. Had I been writing a real compiler, there’s a very good chance I’d have written a ‘source fuzzer’.

All in all, after I got over the nastiness which is Bison, it was pretty fun. It has been a long time since I wrote C or C++, and it’s good to clean some of the rust away from time to time.

Fractals Graphics Programming Python

A second look at the dragon fractal

At first when I drew the (twin)dragon fractal, I had a small bug. I used the base 1+i instead of 1-i. This also generated a very similar looking beast. Thinking about that for a while, made me curious. Just like the Mandelbrot set, maybe other interesting fractals could be generated using exactly the same method, with a different complex ‘seed’.

So I patched up my code, and made it output a series of images for complex numbers on a path. I thought the reasonable choice would be a spiral, so using t*(cost+isint) as a base I wrote a spiral that would go around the origin several times and finally land on 1-i.
It might seem obvious to you to try and make this interactive instead – why not move the mouse around and watch the different fractals that emerge? Well, I wanted a video.

I also wanted the fractal to convey a little more information. So each point in the set was given a color according to its generation. I decided after some trial and error that white was better for the older generation, and red for younger generations.
After some PIL and ffmpeg work, it was ready.

While working on the video, I witnessed some very interesting fractals with different seeds.
I was very curious as to why certain shapes emerged. For example, a square pattern was very common:
Squares fractal
This turned out to be not that surprising. Its generator number is of the shape 0+ia. So it made sense. I still didn’t figure out how come hexagon shapes were so common:
Hexagons1 Hexagons2 Hexagons1

Maybe it had something to do with pi/6, I’m not too sure about that. If it did, I would expect to see many more regular polygons.
Here is another curious one:

Another interesting phenomenon was what I started to call ‘the dragon’s heart’. You see, in the final dragon, the starting generations were spread about pretty evenly. However, with other bases, even ones generating something which is pretty similar to the dragon, the starting generations are clustered together – sometimes at the side, sometimes at the middle.

I’ve got a feeling (not proven or anything) that to be space filling, a fractal’s starting generations should be spread about. What do you think?

Assembly Compilation computer science Personal Programming Python

Writing a Quad Interpreter

My compilation homework is writing a compiler from simplified CPL to Quad, a made up assembly language. The compiler was going well, and when it started to emit code, I naturally wanted to test it.

The problem is, I work on Ubuntu Linux, and the only available Quad interpreter was for Windows. Trying to run it with Wine didn’t work, and I gave up on it pretty quickly. I also didn’t want to start working on Windows. So I sat down for half an hour and wrote a Quad interpreter of my own, in Python.

Afterwards, I thought of publishing it, and sat down to write a step-by-step mode as well. I didn’t implement breakpoints, although it is quite easy.

The compiler is working now, although I do have to do the finish. Since it is a homework assignment, I will not be publishing it, but I will publish the Quad Interpreter. I hope it benefits you.

A note of caution though: I didn’t test it thoroughly . The most I did was running it with some toy programs and the output of my compiler to see if it works. It does, and that’s enough for me.


Folding an origami swan

I believe the swan is one of the most common origami folds. I think that almost every origami book has instructions for a swan.

Well, I was playing around with the standard crane (the wing flapping bird), and came upon this thing:

an origami swan

The idea behind it is pretty simple. Get to the crane, unfold the back part, and reverse fold up, and align with the wings. I can upload pictorial instructions if anyone is interested.


New theme

I usually try to avoid meta posts, and I don’t like Ars Poetica. (Maybe I do like it in programs though, but only to a limited extent.)

However, as you can probably see, I’ve updated the theme of the blog. It took me a long while to choose something I liked, and then work on it until it looked good on the blog. I’ll be happy to hear thoughts and opinions about it.

UPDATE: I’ve had some problems with this theme, so it might be on and off. For those interested in how it should look, here’s a link to a screenshot.

Fractals Math Programming Python

Fractals in 10 minutes no. 3 – The Dragon

When first I looked through the pages of the book “Hacker’s Delight”, I found myself looking at the chapter about bases. There I learned a very curious fact – with the digits of 0,1 and the base of -2, you can represent any integer. Right afterwards I learned something even more interesting – with the digits of 0,1 and the base of 1-i, you can represent and number of the form a+bi where a and b are integers. Having nothing to do with this curious fact, I let the subject go.
Some time later, I was reading through Knuth’s “Art of Computer Programming”, and found that with the base of (1-i)^-1, and digits of 0,1 you can generate the dragon fractal!

The dragon fractal

Generating the fractal is quite simple actually:

def create_dragon_set(n):
    """calculate the dragon set, according to Knuth"""
    s = set([0.0+0.0j])
    for i in range(n):
        new_power = (1.0-1.0j)**(-i)
        s |= set(x+new_power for x in s)
    return s

(By the way, can you do it better?)

The annoying part is converting the complex numbers to drawable integer points. After doing so, I used PIL to draw the jpeg.
Here’s a link to the code.

Cryptography Math Programming Python

Using zpint in Cryptography Homework

Originally, I wrote zpint to help me with my algebric structures homework. For those not familiar with it, it allows you to do computations modulo p the same way you do for ints. Not surprisingly, I found myself using it in my cryptography homework as well. At first I used it for trying to breaking the Hill Cipher. Then for some computations such as using the Chinese Remainder Theorem to solve equations, or to compute RSA values. Last it was for checking if algorithms I wrote were correct.

Usually I prefer to write a script than do all the computations by hand, even if writing the script is more work. Mostly because it is work I enjoy more. This time I had a script ready. I improved it a little bit during the semester, and it is still available. It is certainly not the fastest way to compute, but it is a fast way of doing computations.


New origami bug fold

While playing a little bit with paper, I started playing with the water bomb base. I then moved into the frog-base (I think) whice has some potential for bugs: 4 long limbs, elongated body, and four shorter flaps.

Here are the results:

These two critters look moderately bug-like. Note that the left one has these two fins at the side – those are two of the flaps. One other is the head.

I had another piece of paper which is now fubar (folded up beyond all recognition) of the same fold, which resembled a human being. I’ll probably be playing with this fold some more, while looking for minimum cuts (in graphs) and writing a compiler (for cpl).

Challenges Design Programming Python

LRU cache solution: a case for linked lists in Python

The reason I put up the LRU cache challenge up, was that I couldn’t think of a good solution to the problem without using linked lists. This has been pointed to by Adam and Erez as well. Adam commented on this, and Erez’ solution to the problem was algorithmically identical to mine.

So how to solve the challenge? Here are the two possible solutions I thought about:

  • Use a dict for lookup, and each element’s age is indicated by its position in a linked list. This is the solution Erez and I implemented.
  • Keep a ‘last time of use’ indicator for each element. This could be just a regular int, incremented by 1 for each lookup. Keep the elements in a min heap, and when there are too many elements, pop them using the minimum heap.

Generally, I consider the first solution more elegant. It doesn’t rely on an integer to work, so it could work ‘indefinitely’. Of course, the second solution can be also made to work indefinitely, with some upkeep from time to time. (The added time cost of the upkeep may be amortized over other actions.)

If you can think of some other, more elegant solution, I’ll be happy to hear about it.

So, given that a linked list solution is more elegant, we come to the crux of the problem: what to do in Python? The Python standard library does not contain a linked list implementation as far as I know. As a result, Python programmers are encouraged to use the list type, which is an array. This is just as well: for most intents and purposes, the list type is good enough.

I tried to think a little about other cases where a linked list was more appropriate, and I didn’t come up with any more such cases. If you come up with any such case, I’ll be happy to hear about it.

After looking for a public implementation, and not finding one that seemed good enough, I decided to go ahead and write my own.

Out of curiousity, I also did a small comparison of runtime speeds between my implementation of a linked list, and the list data type. I tried a test where a linked list has an obvious advantage (complexity wise)- removing elements from the middle of the list. The Python list won up to somewhere in the thousands of elements. (Of course, list is implemented in C, and mine is in pure Python).

What is my conclusion from all of this? The same as the conventional wisdom: use the list data type almost always. If you find yourself in need of a linked list, think long and hard (well, not too long) about your problem and solution. There’s a good chance that either you can use the built-in list with an equivalent solution, or that a regular list will still be faster, for most cases. Of course, if you see no other way – do what you think is best.

Your thoughts?