Arkon and I decided to write a VM for vial. First though, a short explanation on what is vial:
vial is a project aimed at writing a general disassembler that outputs expression trees instead of text. On top of vial, we intend to write various code-analysis tools. The expression trees in the output should be an accurate description of the all of the code’s actions.
(note: the x86 disassembler behind vial is Arkon’s diStorm.)
So why do we need a VM? Apart from it being ‘nice and all’, it is critical for testing.
Some time ago, I described writing a VM to test a compiler I wrote as university homework. It is a similar issue here.
The disassembler is written according to the x86 specification. If we just check its output against this specification, we are not doing much to verify the code’s correctness. This is evident when you try to implement such a testing module – you end up writing another disassembler, and testing it against the original one. There has to be a different test method, one that does not directly rely on the specification.
Enter the VM. If you write a program, you can disassemble it, and then try to execute the disassembly. If it yields the same output as the original program – your test passed.
This is a good testing method, because it can be easily automated, reach good code coverage, and it tests against known values.
Consider the following illustration:
We are testing here a complete process on the left hand, against a known valid value, the original program’s output, on the right hand. All of the boxes on the left hand are tested along the way. Of course, one test may miss. For example, both the VM and the disassembler may generate wrong output for register overflows. We can try to cover as many such cases as possible by writing good tests for this testing framework. In this case, good tests are either c programs, or binary programs. This is essentially what I was doing when I manually fuzzed my own compiler.
Once the VM is finished, we can start writing various optimizations for the disassembler’s generated output. We can test these optimizations by checking the VM’s output on the optimized code against the output on the original code. This makes the VM a critical milestone on the road ahead.