Archive for the ‘Math’ Category

Visualizing Data Using the Hilbert Curve

Saturday, April 17th, 2010

Some time ago, a coworker asked me to help him visualize some data. He had a very long series (many millions) of data points, and he thought that plotting a pixel for each one would visualize it well, so he asked for my help.
I installed Python & PIL on his machine, and not too long after, he had the image plotted. The script looked something like:

data_points = get_data_points()
n =  int((len(data_points)**0.5) + 0.5)

image = Image('1', (n, n))
for idx, pt in enumerate(data_points):
    image.putpixel(pt, (idx/n, idx%n))
image.save('bla.png', 'png')

Easy enough to do. Well, easy enough if you have enough memory to handle very large data sets. Luckily enough, we had just enough memory for this script & data series, and we were happy. The image was generated, and everything worked fine.

Still, we wanted to improve on that. One problem with this visualization is that two horizontally adjacent pixels don’t have anything to do with each other. Remembering xkcd’s "Map of the Internet", I decided to use the Hilbert Curve. I started with wikipedia's version of the code for the Python turtle and changed it to generate a string of instructions of where to put pixels. On the way I improved the time complexity by changing it to have only two recursion calls instead of four. (It can probably be taken down to one by the way, I leave that as a challenge to the reader :)

Unfortunately, at this point we didn’t have enough memory to hold all of those instructions, so I changed it into a generator. Now it was too slow. I cached the lower levels of the recursion, and now it worked in reasonable time (about 3-5 minutes), with reasonable memory requirements (no OutOfMemory exceptions). Of course, I'm skipping a bit of exceptions and debugging along the way. Still, it was relatively straightforward.

Writing the generator wasn't enough - there were still pixels to draw! It took a few more minutes to write a simple "turtle", that walks the generated hilbert curve.
Now, we were ready:

hilbert = Hilbert(int(math.log(len(data_points), 4) + 0.5))
for pt in data_points:
    x,y = hilbert.next()
    image.putpixel(pt, (x,y))

A few minutes later, the image was generated.

Fast Peak Autocorrelation

Wednesday, October 14th, 2009

So, I was at geekcon. It was a blast.
The lunar lander
There were many interesting projects, and I didn't get to play with them all. I did get to work a bit on the Lunar Lander from last year, and this year it was finished successfully. My part was the PC game which interfaced with the microcontroller controlling the lander. As you probably guessed, it was written in Python.

This year, as promised, I worked on the Automatic Improviser. I worked on it with Ira Cherkes.

While the final version worked, it didn't work well enough, and there is still much work to be done. Still, we had excellent progress.
By the way, I know this subject has been tackled before, and I still wanted to try it myself, without reading too much literature about it.

One of the components of the system is a beat recognizer. My idea to discover the beat is simple: find the envelope (similar to removing AM modulation), and then find the low "frequency" of the envelope.
Instead of doing a Fast Fourier Transform for beat recognition, we were advised that autocorellation will do the trick better and faster. However, when trying to autocorellate using scipy.signal.correlate we discovered that autocorellation was too slow for real time beat recognition, and certainly wasteful.

To solve this issue, we decided to first do peak detection on the envelope, and then autocorellate the peaks. Since there shouldn't be too many peaks, this has the potential of being really quick. However, there was no standard function to do autocorellation of peaks, so we implemented it ourselves. We were pretty rushed, so we worked fast. Here's the code:

def autocorrelate_peaks(peaks):
    peaks_dict = dict(peaks)
    indexes = set(peaks_dict.keys())
    deltas = set()
    for i in indexes:
        for j in indexes:
            if j>i:
                continue
            deltas.add(i-j)

    result = {}
    for d in deltas:
        moved = set(i+d for i in indexes)
        to_mult = moved & indexes
        assert to_mult <= indexes
        s = sum(peaks_dict[i-d]*peaks_dict[i] for i in to_mult)
        result[d] = s
    return result

This function takes as input a list of tuples, each of the form (peak_index, peak_value), and returns a mapping between non-zero corellation offsets and their values.
Here's is a sample run:

In [7]: autocorrelate_peaks([(2, 2), (5,1), (7,3)])
Out[7]: {0: 14, 2: 3, 3: 2, 5: 6}

After implementing this function, our recognition loop was back to real-time, and we didn't have to bother with optimizing again.

Checking the ulam spiral

Thursday, September 17th, 2009

In the following post, the Ulam spiral is described. It's a very simple object - write down consecutive natural numbers starting from 41 in a square spiral. Curiously, the numbers on the diagonal are primes:

Ulam spiral

Reading this post, I immediately wanted to check how long this property holds. The original blog post suggests:

Remarkably, all the numbers on the red diagonal are prime — even when the spiral is continued into a 20 × 20 square.

Yet it doesn't mention if it stops there or not. Without peeking (in Wikipedia), I wrote a simple program to check:

import sys
import psyco
psyco.full()

UP = 0
LEFT = 1
DOWN = 2
RIGHT = 3

DIRECTIONS = {UP: (0, -1),
              LEFT: (-1, 0),
              DOWN: (0, 1),
              RIGHT: (1, 0)}

def generate_ulam_seq(n):
    start = 41
    x = 0
    y = 0
    min_x = 0
    max_x = 0
    min_y = 0
    max_y = 0
    value = start
    direction = UP
    for i in xrange(n):
        yield x, y, value
        value += 1
        add_x, add_y = DIRECTIONS[direction]
        x, y = x + add_x, y + add_y
        if x <min_x:
            direction = (direction+1) % 4
            min_x = x
        if x> max_x:
            direction = (direction+1) % 4
            max_x = x
        if y <min_y:
            direction = (direction+1) % 4
            min_y = y
        if y> max_y:
            direction = (direction+1) % 4
            max_y = y

def is_prime(n):
    return all(n%x != 0 for x in xrange(2, int((n**0.5) + 1)))

def main():
    for x, y, v in generate_ulam_seq(int(sys.argv[1])):
        if x == y and not is_prime(v):
            print x, y, v

if __name__ == '__main__':
    main()

Running it, we get:

> ulam.py 10000
20 20 1681
-21 -21 1763
22 22 2021
-25 -25 2491
28 28 3233
-33 -33 4331
38 38 5893
-41 -41 6683
41 41 6847
42 42 7181
-44 -44 7697
-45 -45 8051
-46 -46 8413
48 48 9353

So the property doesn't hold for a very long time. Reading up on Wikipedia, it seems the Ulam spiral still has interesting properties related to primes in higher sizes. Maybe I'll plot that one as well in a future post.

PS: Regarding primality testing, this is a cute, quick&dirty one liner. In the range of numbers we're discussing, it is 'good enough'.

Fractals in 10 minutes No. 6: Turtle Snowflake

Monday, August 31st, 2009

I didn't write this one, but I found it's simplicity and availability so compelling, I couldn't just not write about it.
In a reddit post from a while ago, some commenter named jfedor left the following comment:

A little known fact is that you can do the following on any standard Python installation:

from turtle import *

def f(length, depth):
   if depth == 0:
     forward(length)
   else:
     f(length/3, depth-1)
     right(60)
     f(length/3, depth-1)
     left(120)
     f(length/3, depth-1)
     right(60)
     f(length/3, depth-1)

f(500, 4)

If you copy paste, it's a fractal in less than a minute. If you type it yourself, it's still less than 10. And it's something you can show a kid. I really liked this one.

Computing Large Determinants in Python

Wednesday, March 25th, 2009

Story:
For my seminar work, I had to calculate the determinant of a large integer matrix.
In this case, large meant n>=500. You might say that this isn't very large and I would agree. However, it is large enough to overflow a float and reach +inf. Reaching +inf is bad: once you do, you lose all the accuracy of the computation. Also, since determinant calculations use addition as well as subtraction, if +inf is reached in the middle of the calculation the value might never drop back, even if the determinant is a "small" number.

(Note: Even if the calculation hadn't reached +inf, but just some very large floating point number, the same phenomenon could have occurred.)

As I said, my matrix was composed of integers. Since a determinant is a sum of multiples of matrix elements, you would expect such a determinant to be an integer as well. It would be nice to take advantage of that fact.

How should one compute such a determinant? Well, luckily Python has support for arbitrarily large integers. Let's compute the determinant with them!
Unfortunately, Python's numpy computes determinants using LU decomposition, which uses division - hence, floating point values.
Well, never mind, I know how to calculate a determinant, right?
Ten minutes later the naive determinant calculation by minors was implemented, with one "minor" setback - it takes O(n!) time.
Googling for integer determinants yielded some articles about division free algorithms, which weren't really easy to implement. There was one suggestion about using Rational numbers and Gauss Elimination, with pivots chosen to tame the growth of the nominator and denominator.

So, I hitched up a rational number class, using Python's arbitrarily large integers as nominator and denominator. Then, I got some code to do Gauss Elimination, although my pivoting wasn't the most fancy. This seemed to work, and I got my desired large determinants.

Epilogue:
Not too long ago, I ran across mpmath, a Python library for arbitrarily large floating point numbers. It is possible I could have used it instead of my own rational numbers. Next time I run into a relevant problem I'll keep this library in mind.
Also, an even shorter while ago, I became aware that since 2.6 Python boasts a fractions module. This will sure come in handy in the future.

Code:
Available here. If you intend to use it in Python 2.6 and above, I suggest replacing my Rational class with Python's Fraction.
(Note that I adapted this module to my needs. Therefore it has a timer that prints time estimations for computations, and it uses Psyco.)

Fractals in 10 minutes No. 5: Sierpinski Chaos Game

Thursday, December 25th, 2008

A few years back, I was in some course where they also taught me some Matlab. One of the exercises was to draw the Sierpinski triangle using a method of progressing a point randomly. I was quite surprised at the time, because I thought the Sierpinski triangle was more of an analytical thing: you drew it using inverted triangles.

I wanted to check up on it, and it turns out the method has a name: Chaos Game.
To generate a Sierpinski triangle using this method, one starts with a some point inside the triangle. Then, at each step, the next point is half the distance from the current point to one of the corners selected at random. I think this method of generating the Sierpinski triangle is even easier than the analytical one.

A sierpinski triangle generated using a chaos game

I used pylab (matplotlib) to create this image.
As I usually do, I also wanted to draw it using ascii-art. However, I must confess, I am not satisfied with the result:

              %6
             6*%8
            %8  66
           %6%*6%%6
          6%'    '66
         6866'  '6%86
        %6* 86 '%8 *66
       %**%%'***%6***%%
      8'6            6*8
     6%%*8          %*'*6
    6%6'666        86# 86%
   %%*''**%6      %**''*'%6
  *%8'   '6%6    668'   '#%%
 %% '6' '#*'6%  *% %8' '6''6*
%68%#6#*#68*86*%6#%8%8'6%6*6%*

The code isn't really good, as I didn't put much thought into it and just hacked it up. Still, I'm putting it up, and I might improve it someday.

Fractals in 10 minutes No. 4: Mandelbrot and Julia in Ascii Art

Wednesday, December 10th, 2008

I felt like it's about time I tackled the Mandelbrot and Julia sets, in Ascii-Art. Heck, the Mandelbrot fractal is on the logo of this Blog! However, being the most well-known fractal, this issue was tackled already, with satisfactory results.

Still, I took the ten minutes required to draw them in Python, using numpy:

def get_mandelbrot(x, num_iter = 10):
    c = x
    for i in range(num_iter):
        x = x**2 + c
    return abs(x)>2

def get_julia(x, c, num_iter = 10):
    for i in range(num_iter):
        x = x**2 + c
    return abs(x)>2

"Hey!" you might say, "There's no loop in here!". Indeed, with numpy loops can sometimes be avoided, when using arrays. When the expression x**2 + c is applied to an array x, it is applied element-wise, allowing for implicit loops. The actual magic happens in the following function:

def get_board(bottom_left, top_right, num_x, num_y):
    x0, y0 = bottom_left
    x1, y1 = top_right
    x_values = numpy.arange(x0, x1, (x1-x0)/float(num_x))
    y_values = numpy.arange(y0, y1, (y1-y0)/float(num_y))
    return numpy.array([[x+1j*y for x in x_values] for y in y_values])

The result of get_board will be used as the "x input" later on. It should be noted though that while that's a cute trick this time, it might grow unhandy for more complicated computations. For example, making each element to reflect the iteration number on which it "escaped".
So, here are the results:

############################   ##############
###########################   ###############
##########################     ##############
#####################                ########
####################                #########
############ #######                 ########
############                         ########
#########                           #########
#########                           #########
############                         ########
############ #######                 ########
####################                #########
#####################                ########
##########################     ##############
###########################   ###############

############################################################
############################################################
############################################################
############################ ###############################
######################### ##   #############################
############## #######           ###########################
######## ##      ###              ##########  ##############
#########  ###        ###         ######       #############
##############       ######         ###        ###  ########
###############  ##########              ###      ## #######
############################           ####### #############
##############################   ## ########################
################################ ###########################
############################################################
############################################################

Here's the code. (With the usual bsd-license slapped on top of it :)