Algorithms Assembly computer science Programming Projects Python

Issues in Writing a VM – Part 2

The Incredible Machine

Writing a VM capable of executing expression trees is different from writing a VM for executing assembly instructions. Here I’ll cover several issues stemming from this difference.

The first group of issues involve generality. Supporting a specific instruction set is a straightforward job, even if a hard one. Vial has to support multiple platforms and that’s a little tricky. These are just a few of the problems:

  1. Different registers for each instruction set. This one is easy enough to solve. Just have a node in the expression tree with some value to indicate the register.
  2. Register overlaps. A change to one register implies a change to its parents and its children. Consider RAX->EAX->AX->AH,AL. Changing EAX will affect all of the registers in the hierarchy.
    To handle this, we wrote a CPU class to keep all the info about the platform, including register overlaps.
  3. Delayed branches. Some platforms have branch delay slots. This means that after any branch instruction, right before the branch is taken, instructions in the delayed slots are executed anyway. For instance, SPARC has three delay slots, while MIPS has just one. This isn’t an easy issue to solve, and for now we didn’t tackle it. We’ve got a few ideas though.

To make sure that our implementation is generic enough, we decided to write a skeleton disassembler implementation for MIPS as well.

The second group of issues involve the nature of expression trees versus instructions:

  1. Stepping over statements or instructions? Each expression tree for an instruction usually holds more than one statement. For example, dec eax changes eax as well as the zero flag. Since some instructions like rep stosd may contain a long loop, being able to step over statements instead of expressions is preferable.

    The problem is, executing expression trees is done with a DFS-like walk. If implemented with recursion it makes pausing the walk for each step a bit complicated. However, recursion is the clearest way to write the execution code, and I’d rather not give it up.

    My solution was to use Python generators. Each time a statement was executed, the function would yield, thus returning control to the calling function, while keeping its state intact.

  2. Instruction Pointer changes. The lower-level disassembler returns expression trees for each instruction. However, it does not return the jump to the next instruction as well. While this is the required behavior, it means that the VM should change the instruction pointer after executing each instruction.

    This is easier said than done: What should we do after jumps? Several possible solutions emerged.

    The first was to append an IP change to each instruction’s expression. Those changes will have to be appended only where needed.
    A second solution was to check if the IP was changed, and if it was not, change it. This solution however will not support instructions that jump to themselves.
    The last and our preferred solution was to check if the IP was touched. This solution is simple and very straightforward.

There are many more issues I didn’t write about, for example, self modifying code. I’ll leave those issues for future articles.


Back From the Dead

As you can see, the website is back online, and I hope it stays that way.
Cheers go to Randy and James, for fixing the server.

I’ve been doing various things lately, including work on Vial and various university tests. I’ll be sure to write some more now that the website is back online, to make up for the lost time.

I also had a job interview in Tel Aviv a few days ago. Here’s a little critter I folded on the train (click for larger images):

Origami FlierOrigami Flier

This one is made from a square piece of paper, and it has 6 legs.

Assembly computer science Programming Projects Testing

Issues in writing a VM – Part 1

Arkon and I decided to write a VM for vial. First though, a short explanation on what is vial:
vial is a project aimed at writing a general disassembler that outputs expression trees instead of text. On top of vial, we intend to write various code-analysis tools. The expression trees in the output should be an accurate description of the all of the code’s actions.
(note: the x86 disassembler behind vial is Arkon’s diStorm.)

So why do we need a VM? Apart from it being ‘nice and all’, it is critical for testing.

Some time ago, I described writing a VM to test a compiler I wrote as university homework. It is a similar issue here.
The disassembler is written according to the x86 specification. If we just check its output against this specification, we are not doing much to verify the code’s correctness. This is evident when you try to implement such a testing module – you end up writing another disassembler, and testing it against the original one. There has to be a different test method, one that does not directly rely on the specification.

Enter the VM. If you write a program, you can disassemble it, and then try to execute the disassembly. If it yields the same output as the original program – your test passed.
This is a good testing method, because it can be easily automated, reach good code coverage, and it tests against known values.
Consider the following illustration:

Testing Process

We are testing here a complete process on the left hand, against a known valid value, the original program’s output, on the right hand. All of the boxes on the left hand are tested along the way. Of course, one test may miss. For example, both the VM and the disassembler may generate wrong output for register overflows. We can try to cover as many such cases as possible by writing good tests for this testing framework. In this case, good tests are either c programs, or binary programs. This is essentially what I was doing when I manually fuzzed my own compiler.

Once the VM is finished, we can start writing various optimizations for the disassembler’s generated output. We can test these optimizations by checking the VM’s output on the optimized code against the output on the original code. This makes the VM a critical milestone on the road ahead.


Origami Rhino Instructions

Finally, here are the instructions for the rhino. I found out it is not a clean fold – the ending does not fold neatly. It is also not an easy fold, and it takes some work to get done.
That said, have fun with it, the end result does look OK.

Origami Rhino

Read on for the full instructions.

Programming Python

The Case Against Floating Point ==

While working on the clipping bug, I had to do some floating point comparisons. To handle this virtual mine-field, I used simple precision comparisons.

Not surprisingly, I wanted to know what’s ‘the right way’ to do this in Python. When I discussed this subject on #python, I was told to take a look at the decimal module, of which I wasn’t aware at the time.

Most experienced programmers know that you shouldn’t compare floating point numbers with ==. If you want to check floating point equality, you usually decide on a precision, and check either the absolute error, or the relative error. Hence, floating point == isn’t used, maybe except for rare circumstances. I would even venture to say that when possible, static checkers should emit warnings on floating point ==.

On the other hand there are the beginner programmers. Those usually use == by mistake, and will be surprised by the strange results they sometimes get back.

My suggestion is to do either of the following:
1. Change floating point == to behave like a valid floating point comparison. That means using precision and some error measure.
2. Change floating point == to raise an exception, with an error string suggesting using precision comparison, or the decimal module.

Since this change is not backwards compatible, I suggest it be added only to Python 3.
Personally, I prefer no. 2. It is clearer, and less confusing.

Arguments against this suggestion are:

1. This change breaks existing programs:
I believe it most likely triggers hidden bugs. Since the suggestion is to change it only in Python 3, Those programs will most likely be broken by other changes as well, and will need to be changed anyway.

2. This change breaks compatibility with C-like languages:
I agree. However, the already agreed on change of the / operator is even a stronger break. Most arguments for changing the / operator apply here as well.

3. Programmers will still need the regular ==:
Maybe, and even then, only for very rare cases. For these, a special function\method might be used, which could be named floating_exact_eq.

What are your thoughts on the subject?

Programming Python Utility Functions

Various Small Python Helpers

I’ve been working lately with arkon on various projects related to diStorm. One of these projects involved writing a solid Python API around an existing C API. This C API uses a lot of flags as arguments and return values of functions.
Since I want my code to be easy to use in an interactive shell, I’d like to return something that’s not just a number. The obvious solution is to use a Python Enum. There are many implementations going around, most of them equivalent. I’ve got the one I use. What I recently added though, is the SymbolInt:

def SymbolInt(value, name):
    class _SymbolInt(int):
        def __str__(self):
            return name
        def __repr__(self):
            return 'SymbolInt(%d, "%s")' % (value, name)
        def __eq__(self, other):
            if isinstance(other, str):
                other = other.lower()
            return int(self)==other or name.lower() == other
        def __ne__(self, other):
            return not self == other
    return _SymbolInt(value)

Short, and very much to the point, this makes exploratory API’s much more readable and usable. It is useful as a return value from enum functions as well.

Along with SymbolInt I wrote hexint:

class hexint(long):
    def __str__(self):
        return hex(self)[2:-1]
    __repr__ = __str__
    def __repr__(self):
        return hex(self)[:-1]

This one makes program addresses readable.

Lastly, here’s a little function that I found missing from itertools:

def head(iterable, num_items):
    for obj,i in zip(iterable,xrange(num_items)):
        yield obj

UPDATE: Don’t use this head() function, use itertools.islice instead. Thanks go to Erez for pointing that out.

Algorithms Geometry Math Programming Python

Two Mathematical Bugs

A few days ago, I discovered I had at least one, maybe two mathematical bugs. The first bug is in the line clipping algorithm. Here’s a quick reminder of what’s going on there:

There are six points of interest: the four intersection points, and the two endpoints of the line. We sort the six points according to their parameter, and take the two innermost points as the endpoints of the clipped line. To check if the line is outside the box we make sure that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).

Turns out this is not always true – here’s a counter example:

Counter Example for the Line Clipping Algorithm

In the counter example you can see a line, whose endpoints lie between the intersections. The algorithm described above will say this line lies completely inside the box. Clearly this is not true. How do we fix this? Simple actually. We check for each intersection point whether or not it is on the edges of the clipping rectangle. If at most one point is on – our line is outside. I hope this is the end of it. It will require more rigorous testing.

The second bug was in my solution to the random selection challenge. I wasn’t too sure of my solution, but I really liked it, as its code is rather elegant. So when meeting someone I used to work with, I told her about it. After thinking about my solution a bit, she said it is probably wrong, and gave me the reason: While the probability of a point x passing the first stage is indeed (proportional to) p(x), the probability of it passing the second stage is not 1 (or constant). I still need to think about this one, so no easy fix yet. I guess this reopens the challenge, so if anyone feels like writing a good proof or a counter example for my solution, I’d be happy to see it. I’ll be even happier to see a completely different solution.


More Origami

After working on it, I managed to reverse engineer my origami rhino fold. Here’s how it’s going to look:

Origami Rhino

To reverse engineer it, I had to open my original fold, and try to reproduce the fold, a bit like working on a crease pattern. I did my first version on some draft paper, so now I have a total of three rhinos. I even managed to take a photo of them playing with each other, national geographic style (may be considered nsfw). Expect to see full instructions for the rhino sometime soon.

Some time ago, along with the newspaper, came a lottery ticket. Believing that gambling is a form of taxation for fools, it was just lying around. Then one day I discovered it has a ratio of exactly 3/1. I decided to try something new, and came up with the following creation, which I dub, ‘lottery spider’. It has a venomous, addictive sting, beware!

Origami Spider made from a lottery ticket