Collision: the story of the random bug

So here I was, trying to write some Django server-side code, when every once in a while, some test would fail.
Now, it is important to know that we are using any_model, a cute little library that allows you to specify only the fields you need when creating objects, and randomizes the rest (to help uncover more bugs).

In this particular instance, the test that was failing was trying to store objects on the server using an API, and then check that the new objects exist in the DB. Every once in a while, an object didn’t exist. It should be noted that the table with the missing rows had a Djano-ORM URLField.

So first things first, I changed the code to print the random seed it was using on every failure. Now the next time it failed (a day later), I had the random seed in hand.

I then proceeded to use that random seed – and now I had a reproducible bug – it failed every time, consistently.

The next step was finding the cause of the bug. To cut a long story short – it turns out that it looked for an object with a specific URL. Which url? the url created for the first object (we had two).

The bug was that the second object was getting the same url as the first. I remind you, these urls are generated randomly. The troublesome url was http://72.14.221.99

I leave you now to guess/check what are the chances for the collision here
(the correct way to do that would be to check any_model’s code for generating urls, and not just say 1 in 2^32… :)

So I made sure the second object got a new url, and all was well, and the land had rest for forty years. (or less).

This entry was posted in Python and tagged , , , , , . Bookmark the permalink.

2 Responses to Collision: the story of the random bug

  1. Brendan Dolan-Gavitt says:

    Hmm; I can’t be sure since I’m not familiar with any_model, but this commit [1] to django-any leads me to believe it was choosing randomly from a fixed list of length 8 (!). In which case the probability is roughly 1/sqrt(8), i.e. around 1/2.8.

    But surely the bug would have been cropping up almost constantly if this were the case, so I’ll assume it was actually creating a random IP with something like
    “http://%d.%d.%d.%d” % (random.randint(0,255) for _ in range(4)). In that case due to the birthday paradox the probability of collision is around 1 in 2**16, which is frequent enough to show up sporadically.

    [1] https://github.com/kmmbvnr/django-any/commit/6f64ebd05476e2149e2e71deeefbb10f8edfc412#django_any/models.py

  2. Tomer Filiba says:

    lol. interesting insight