Archive for the ‘computer science’ Category

The mathematics behind the solution for Challenge No. 5

Friday, February 26th, 2010

If you take a look at the various solutions people proposed for the last challenge of generating a specific permutation, you’ll see that they are very similar. Most of them are based on some form of div-mod usage. The reason this is so, is because all of these solutions are using the Factorial Base.

What does that mean?
Note that we usually encounter div-mods when we want to find the representation of a number in a certain base. That should already pique your interest. Now consider that a base’s digits need not have the same weight. For example, consider how we count the number of seconds since the start of the week:

seconds of the last minute, A (at most 60-1)
minutes of the last hour, B (at most 60-1)
hours of the last day, C (at most (24-1)
days of the last week, D (at most 7-1)

So given A, B, C, D, we would say that the number of seconds is:
A + 60*B + 24*C + 7*D. This certainly looks like a base transformation. To go back, we would use divmod.

The factorial base is just the same, with the numbers n, n-1, … 1. Note that in the factorial base, you can only represent a finite number of numbers – n!. This should not be surprising – this is what we set out to do in the first place!
The thing that I found really amazing about this is that all the people to whom I posed this challenge came up with almost the same “way” of solving it.

Other interesting curiosities regarding bases can be found in Knuth’s book, “The Art of Computer Programming”, volume 2, Section 4.1.

Small Programming Challenge no. 5 – Generating a Permutation

Wednesday, November 11th, 2009

I thought of this one quite a long time ago, and I believe that the idea behind it is pretty nice mathematically. I got the idea for it from Knuth’s “The Art of Computer Programming”.

The challenge is simple:
write a function that receives as arguments two numbers, n, and num such that 0 <= num < n!. This function needs to return an array (list) representing a permutation of the numbers 0..n-1. For each possible num, the function needs to return a different permutation, such that over all values of num, all possible permutations are generated. The order of permutations is up to you.

The function you write should do this in at most O(n) time & space (Various O(nlogn) are also acceptable).
Write your solutions in the comments, in [ LANG ] [/ LANG ] blocks (without the spaces) where LANG is preferably Python :). I will post my solution in a few days. As usual, the most efficient & elegant solution wins.

Go!

10 Python Optimization Tips and Issues

Monday, September 21st, 2009

Following my previous post on Optimizing Javascript, I thought I’d write a similar post regarding Python optimization.
(more…)

My solution to the counting sets challenge

Monday, September 7th, 2009

A few days ago, I wrote up a challenge - to count the number of sets a given set is contained in.

In the comments, I touched briefly on the original problem from which the challenge was created, and I'll describe it in more depth here.
In the problem, I am given an initial group of sets, and then an endless 'stream of sets'. For each of the sets in the stream, I have to measure its uniqueness. relative to the initial group of sets. A set that is contained in only one set from the initial group is very unique, one that is contained in ten - not so much.

So how to solve this problem? My original solution is somewhat akin to the classic "lion-in-the-desert" problem, but more like the "blood test" story. I didn't find a link to the story, so I'll give it as I remember it.

In an army somewhere, it was discovered that at least one of the soldiers was sick and so had to be put in isolation until he heals. It is only possible to check for the disease via a blood test, but tests are expensive, and they didn't want to test all of the soldiers. What did they do?

They took enough blood from each soldier. Now, from each sample they took a little bit, and divided the samples into two groups. They mixed together the samples of each group, and tested the mixed sample. If the sample was positive - they repeated the process for the blood samples of all the soldiers in the matching group.

Now my solution is clear: let's build a tree of set unions. At bottom level will be the union of couples of sets. At the next level, unions of couples of couples of sets. So on, until we end up with just two sets, or even just one - if we are not sure the set is contained in any of the initial sets.

Testing is just like in the story. We'll start at the two biggest unions, and work our way down. There is an optimization though - if a set appears more than say, 10 times, it's not very unique, and its score is zeroed. In that case, we don't have to go down all the way, but stop as soon as we pass the 10 "positive result" mark.

Here's the code:

class SetGroup(object):
    def __init__(self, set_list):
        cur_level = list(set_list)
        self.levels = []
        while len(cur_level)> 1:
            self.levels.append(cur_level)
            cur_level = [union(couple) for couple in blocks(cur_level, 2)]
        self.levels.reverse()

    def count(self, some_set, max_appear = None):
        indexes = [0]
        for level in self.levels:
            indexes = itertools.chain((2*x for x in indexes), (2*x+1 for x in indexes))
            indexes = (x for x in indexes if x <len(level))
            indexes = [x for x in indexes if some_set <= level[x]]
            if max_appear is not None and len(indexes)>= max_appear:
                return max_appear
        return len(indexes)

Here's a link to the full code.

I didn't implement this solution right away. At first, I used the naive approach, of checking against each set. Then, when it proved to be too slow, I tried implementing the solution outlined by Shenberg and Eric in the comments to the challenge. Unfortunately, their solution proved to be very slow as well. I believe it's because some elements appear in almost all of the sets, and so computing the intersection for these elements takes a long time.
Although originally I thought that my solution would suffer from some serious drawbacks (can you see what they are?), the max_appear limit removed most of the issues.

Implementing this solution was a major part of taking down the running time of the complete algorithm for the full problem I was solving from about 2 days, to about 15-20 minutes. That was one fun optimizing session :)

Fractals in 10 minutes No. 6: Turtle Snowflake

Monday, August 31st, 2009

I didn't write this one, but I found it's simplicity and availability so compelling, I couldn't just not write about it.
In a reddit post from a while ago, some commenter named jfedor left the following comment:

A little known fact is that you can do the following on any standard Python installation:

from turtle import *

def f(length, depth):
   if depth == 0:
     forward(length)
   else:
     f(length/3, depth-1)
     right(60)
     f(length/3, depth-1)
     left(120)
     f(length/3, depth-1)
     right(60)
     f(length/3, depth-1)

f(500, 4)

If you copy paste, it's a fractal in less than a minute. If you type it yourself, it's still less than 10. And it's something you can show a kid. I really liked this one.

Small Python Challenge No. 4 – Counting Sets

Friday, August 28th, 2009

This is a problem that I encountered a short while ago. It seems like it could be easily solved very efficiently, but it's not as easy as it looks.
Let's say that we are given N (finite) sets of integers - S. For now we won't assume anything about them. We are also given another set, a. The challenge is to write an efficient algorithm that will count how many sets from S contain a (or how many sets from S a is a subset of).

Let's call a single test a comparison. The naive algorithm is of course checking each of the sets, which means exactly N comparisons. The challenge - can you do better? When will your solution outperform the naive solution?

I will give my solution in a few days. Submit your solutions in the comments, preferably in Python. You can write readable code using [ python ] [ /python ] blocks, just without the spaces.

Computing Large Determinants in Python

Wednesday, March 25th, 2009

Story:
For my seminar work, I had to calculate the determinant of a large integer matrix.
In this case, large meant n>=500. You might say that this isn't very large and I would agree. However, it is large enough to overflow a float and reach +inf. Reaching +inf is bad: once you do, you lose all the accuracy of the computation. Also, since determinant calculations use addition as well as subtraction, if +inf is reached in the middle of the calculation the value might never drop back, even if the determinant is a "small" number.

(Note: Even if the calculation hadn't reached +inf, but just some very large floating point number, the same phenomenon could have occurred.)

As I said, my matrix was composed of integers. Since a determinant is a sum of multiples of matrix elements, you would expect such a determinant to be an integer as well. It would be nice to take advantage of that fact.

How should one compute such a determinant? Well, luckily Python has support for arbitrarily large integers. Let's compute the determinant with them!
Unfortunately, Python's numpy computes determinants using LU decomposition, which uses division - hence, floating point values.
Well, never mind, I know how to calculate a determinant, right?
Ten minutes later the naive determinant calculation by minors was implemented, with one "minor" setback - it takes O(n!) time.
Googling for integer determinants yielded some articles about division free algorithms, which weren't really easy to implement. There was one suggestion about using Rational numbers and Gauss Elimination, with pivots chosen to tame the growth of the nominator and denominator.

So, I hitched up a rational number class, using Python's arbitrarily large integers as nominator and denominator. Then, I got some code to do Gauss Elimination, although my pivoting wasn't the most fancy. This seemed to work, and I got my desired large determinants.

Epilogue:
Not too long ago, I ran across mpmath, a Python library for arbitrarily large floating point numbers. It is possible I could have used it instead of my own rational numbers. Next time I run into a relevant problem I'll keep this library in mind.
Also, an even shorter while ago, I became aware that since 2.6 Python boasts a fractions module. This will sure come in handy in the future.

Code:
Available here. If you intend to use it in Python 2.6 and above, I suggest replacing my Rational class with Python's Fraction.
(Note that I adapted this module to my needs. Therefore it has a timer that prints time estimations for computations, and it uses Psyco.)