A few days ago, I found a in my website, plnnr.com. The bug was in a new feature I added to the algorithm. The first thing I did was write a small unit-test to reproduce the bug. With that unit-test in hand, I then worked to fix the bug, and got this unit-test to pass.
As I previously persumed this feature to be (relatively :) bug free, I decided that more testing was in order. This time however, a single test-case would not be enough – I needed to make sure that the trip-generation algorithm works in many cases. Enter fuzzing.
Plnnr.com generates trips according to trip preferences. Why not generate the trip preferences with a fuzzer, and then check if the planning algorithm chokes on them? While fuzzing is usually used to generate invalid input with the goal of causing the program to crash, in this case I’m generating valid input with the goal of causing the planning algorithm to fail.
Usually fuzzing is done with one of two techniques – exhaustive fuzzing, that goes systematically (possibly selectively) over the input space and random fuzzing, which picks inputs at random – or “somewhat” randomly. In my case, the input space consists of “world data” – locations of attractions, restaurants, etc, and trip preferences – intensity, required attractions, and so on. Since the input space is so large and “unstructured”, I found it much easier to go with random fuzzing.
In each test-case, I will generate a “random world”, and random trip preferences for that world.
Here is some sample code that shows how this might look:
trip_prefs.num_days = random.randint(0, 5) trip_prefs.intensity = random(0, 5) if randbit(): trip_prefs.schedule_lunch = True |
Where randbit is defined like so:
def randbit(prob = 0.5): return random.random() < prob |
This is all very well, but tests need to be reproducible. If a fuzzer-generated test case fails and I can’t recreate it to analyze the error and later verify that it is fixed, it isn’t of much use. To solve this issue, the input generation function receives some value, and sets the random seed with this parameter. Now, generating test cases is just a matter of generating a sequence of random values. Here is my code to do that:
class FuzzTestBase(object): __test__ = False def run_single_fuzz(self, random_seed): pass def fuzz_test(self): random.seed() random_seeds = [str(random.random()) for i in range(NUM_FUZZ_TESTS)] for seed in random_seeds: yield self.run_single_fuzz, seed |
FuzzTestBase is a base-class for actual test classes. Each test class just needs to define its own version of run_single_fuzz, and in it call random.seed(random_seed) and log random_seed.
This code uses nose‘s ability to test generators: it assumes that a test generator yields test functions and their parameters.
A few interesting issues:
* I generate the random seeds beforehand, so that calling random.seed() in the actual test case doesn’t affect the seed sequence.
* Originally I used just random.random() as a seed instead of str(random.random()). The problem with that is that this way it’s not reproducible. random.random() returns a floating point value x, for which usually x != eval(str(x)):
In [10]: x = random.random() In [11]: x == eval(str(x)) Out[11]: False |
Even though x == eval(repr(x)) for that case, there’s still room for error. Unlike floating point numbers, it’s harder to go wrong with string equality. So str(random.random()) is just a cheap way to generate random strings.
I’d recommend that if your testing mostly consists of selected test cases based on what you think is possible user behavior, you might want to add some fuzzed inputs. I originally started the fuzz-testing described in this blog-post to better test for a specific bug. After adding the fuzz-testing, I found another bug I didn’t know was there. This just goes to show how useful fuzzing is as a testing tool. The fact that it’s so easy to implement is just a bonus.
Great post, thanks.
One suggestion for improvement is to add the seed and self.__name__ logging in fuzz_test’s for loop.
Actually, instead of taking inputs from a uniform distribution, it may be better to sample them according to the following “monkey” distribution:
– Initialize the input to the default.
– Repeat the following a certain number of times (e.g., a Poisson distributed number of times):
– Select a random GUI element and “click” it (that is, change the input accordingly).
This gives more weight to inputs that your typical users tend to arrive at. Yes, they’re lazy.