Some time ago, a coworker asked me to help him visualize some data. He had a very long series (many millions) of data points, and he thought that plotting a pixel for each one would visualize it well, so he asked for my help.
I installed Python & PIL on his machine, and not too long after, he had the image plotted. The script looked something like:
data_points = get_data_points() n = int((len(data_points)**0.5) + 0.5) image = Image('1', (n, n)) for idx, pt in enumerate(data_points): image.putpixel(pt, (idx/n, idx%n)) image.save('bla.png', 'png') |
Easy enough to do. Well, easy enough if you have enough memory to handle very large data sets. Luckily enough, we had just enough memory for this script & data series, and we were happy. The image was generated, and everything worked fine.
Still, we wanted to improve on that. One problem with this visualization is that two horizontally adjacent pixels don’t have anything to do with each other. Remembering xkcd’s “Map of the Internet“, I decided to use the Hilbert Curve. I started with wikipedia’s version of the code for the Python turtle and changed it to generate a string of instructions of where to put pixels. On the way I improved the time complexity by changing it to have only two recursion calls instead of four. (It can probably be taken down to one by the way, I leave that as a challenge to the reader :)
Unfortunately, at this point we didn’t have enough memory to hold all of those instructions, so I changed it into a generator. Now it was too slow. I cached the lower levels of the recursion, and now it worked in reasonable time (about 3-5 minutes), with reasonable memory requirements (no OutOfMemory exceptions). Of course, I’m skipping a bit of exceptions and debugging along the way. Still, it was relatively straightforward.
Writing the generator wasn’t enough – there were still pixels to draw! It took a few more minutes to write a simple “turtle”, that walks the generated hilbert curve.
Now, we were ready:
hilbert = Hilbert(int(math.log(len(data_points), 4) + 0.5)) for pt in data_points: x,y = hilbert.next() image.putpixel(pt, (x,y)) |
A few minutes later, the image was generated.