Archive for the ‘Challenges’ Category

The mathematics behind the solution for Challenge No. 5

Friday, February 26th, 2010

If you take a look at the various solutions people proposed for the last challenge of generating a specific permutation, you’ll see that they are very similar. Most of them are based on some form of div-mod usage. The reason this is so, is because all of these solutions are using the Factorial Base.

What does that mean?
Note that we usually encounter div-mods when we want to find the representation of a number in a certain base. That should already pique your interest. Now consider that a base’s digits need not have the same weight. For example, consider how we count the number of seconds since the start of the week:

seconds of the last minute, A (at most 60-1)
minutes of the last hour, B (at most 60-1)
hours of the last day, C (at most (24-1)
days of the last week, D (at most 7-1)

So given A, B, C, D, we would say that the number of seconds is:
A + 60*B + 24*C + 7*D. This certainly looks like a base transformation. To go back, we would use divmod.

The factorial base is just the same, with the numbers n, n-1, … 1. Note that in the factorial base, you can only represent a finite number of numbers – n!. This should not be surprising – this is what we set out to do in the first place!
The thing that I found really amazing about this is that all the people to whom I posed this challenge came up with almost the same “way” of solving it.

Other interesting curiosities regarding bases can be found in Knuth’s book, “The Art of Computer Programming”, volume 2, Section 4.1.

Small Programming Challenge no. 5 – Generating a Permutation

Wednesday, November 11th, 2009

I thought of this one quite a long time ago, and I believe that the idea behind it is pretty nice mathematically. I got the idea for it from Knuth’s “The Art of Computer Programming”.

The challenge is simple:
write a function that receives as arguments two numbers, n, and num such that 0 <= num < n!. This function needs to return an array (list) representing a permutation of the numbers 0..n-1. For each possible num, the function needs to return a different permutation, such that over all values of num, all possible permutations are generated. The order of permutations is up to you.

The function you write should do this in at most O(n) time & space (Various O(nlogn) are also acceptable).
Write your solutions in the comments, in [ LANG ] [/ LANG ] blocks (without the spaces) where LANG is preferably Python :). I will post my solution in a few days. As usual, the most efficient & elegant solution wins.

Go!

My solution to the counting sets challenge

Monday, September 7th, 2009

A few days ago, I wrote up a challenge - to count the number of sets a given set is contained in.

In the comments, I touched briefly on the original problem from which the challenge was created, and I'll describe it in more depth here.
In the problem, I am given an initial group of sets, and then an endless 'stream of sets'. For each of the sets in the stream, I have to measure its uniqueness. relative to the initial group of sets. A set that is contained in only one set from the initial group is very unique, one that is contained in ten - not so much.

So how to solve this problem? My original solution is somewhat akin to the classic "lion-in-the-desert" problem, but more like the "blood test" story. I didn't find a link to the story, so I'll give it as I remember it.

In an army somewhere, it was discovered that at least one of the soldiers was sick and so had to be put in isolation until he heals. It is only possible to check for the disease via a blood test, but tests are expensive, and they didn't want to test all of the soldiers. What did they do?

They took enough blood from each soldier. Now, from each sample they took a little bit, and divided the samples into two groups. They mixed together the samples of each group, and tested the mixed sample. If the sample was positive - they repeated the process for the blood samples of all the soldiers in the matching group.

Now my solution is clear: let's build a tree of set unions. At bottom level will be the union of couples of sets. At the next level, unions of couples of couples of sets. So on, until we end up with just two sets, or even just one - if we are not sure the set is contained in any of the initial sets.

Testing is just like in the story. We'll start at the two biggest unions, and work our way down. There is an optimization though - if a set appears more than say, 10 times, it's not very unique, and its score is zeroed. In that case, we don't have to go down all the way, but stop as soon as we pass the 10 "positive result" mark.

Here's the code:

class SetGroup(object):
    def __init__(self, set_list):
        cur_level = list(set_list)
        self.levels = []
        while len(cur_level)> 1:
            self.levels.append(cur_level)
            cur_level = [union(couple) for couple in blocks(cur_level, 2)]
        self.levels.reverse()

    def count(self, some_set, max_appear = None):
        indexes = [0]
        for level in self.levels:
            indexes = itertools.chain((2*x for x in indexes), (2*x+1 for x in indexes))
            indexes = (x for x in indexes if x <len(level))
            indexes = [x for x in indexes if some_set <= level[x]]
            if max_appear is not None and len(indexes)>= max_appear:
                return max_appear
        return len(indexes)

Here's a link to the full code.

I didn't implement this solution right away. At first, I used the naive approach, of checking against each set. Then, when it proved to be too slow, I tried implementing the solution outlined by Shenberg and Eric in the comments to the challenge. Unfortunately, their solution proved to be very slow as well. I believe it's because some elements appear in almost all of the sets, and so computing the intersection for these elements takes a long time.
Although originally I thought that my solution would suffer from some serious drawbacks (can you see what they are?), the max_appear limit removed most of the issues.

Implementing this solution was a major part of taking down the running time of the complete algorithm for the full problem I was solving from about 2 days, to about 15-20 minutes. That was one fun optimizing session :)

Small Python Challenge No. 4 – Counting Sets

Friday, August 28th, 2009

This is a problem that I encountered a short while ago. It seems like it could be easily solved very efficiently, but it's not as easy as it looks.
Let's say that we are given N (finite) sets of integers - S. For now we won't assume anything about them. We are also given another set, a. The challenge is to write an efficient algorithm that will count how many sets from S contain a (or how many sets from S a is a subset of).

Let's call a single test a comparison. The naive algorithm is of course checking each of the sets, which means exactly N comparisons. The challenge - can you do better? When will your solution outperform the naive solution?

I will give my solution in a few days. Submit your solutions in the comments, preferably in Python. You can write readable code using [ python ] [ /python ] blocks, just without the spaces.

Some Assembly Required No. 1

Saturday, April 12th, 2008

I've been working on some of the instruction tests in vial, and I wanted to test the implementation of LOOP variants. My objective was to make sure the vial version is identical to the real CPU version (as discussed here). To achieve this, I had to cover all of the essential behaviors of LOOP.

Well, using the framework Gil and I wrote, I hacked up some code that should cover the relevant cases:

code_template = """
mov edx, ecx ; control the start zf
mov ecx, eax ; number of iterations
mov eax, 0 ; will hold the result, also an iteration counter
loop_start:

    cmp eax, ebx    ; check if we need to change zf
    setz dh
    xor dh, dl      ; if required, invert zf
    inc eax         ; count the iteration
    cmp dh, 0       ; set zf

    loop%s loop_start
"""
for loop_kind in ['','z','nz']:
    code_text = code_template % loop_kind
    c = FuncObject(code_text)
    for start_zf_value in [0,1]:
        for num_iters in [1,4,10]:
            for when_zf_changes in [1,2,15]:
                c(num_iters, when_zf_changes, start_zf_value)
                c.check()

Note that c(...) executes the code both on vial's VM, and on the real cpu. c.check() compares their return value (EAX) and flags after the execution. I also wanted to avoid other kinds of jumps in this test.

To check that the code ran the same number of times, I returned EAX as the number of iterations.
All the games with edx are there to make sure that I'm testing different zf conditions.

The challenge for today:
Can you write a shorter assembly snippet that tests the same thing?

Small Python Challenge No. 3 – Random Selection

Sunday, February 10th, 2008

This time I'll give two related problems, both not too hard.

Lets warm up with the first:

You have a mapping between items and probabilities. You need to choose each item with its probability.

For example, consider the items ['good', 'bad', 'ugly'], with probabilities of [0.5, 0.3, 0.2] accordingly. Your solution should choose good with probability 50%, bad with 30% and ugly with 20%.

I came to this challenge because just today I had to solve it, and it seems like a common problem. Hence, it makes sense to ask 'what is the best way?'.

The second problem is slightly harder:

Assume a bell shaped function p(x) that you can 'solve'. This means that given a value y, you can get all x such that p(x)=y. For example, sin(x)^2 in [0,pi] is such a function. Given a function such as Python's random.random() that yields a uniform distribution of values in [0,1), write a function that yields a distribution proportional to p(x) in the appropriate interval.

For example, consider the function p(x) = e^(-x^2) in [-1,1]. Since p(0) = 1, and p(0.5)~0.779, the value 0 should be p(0)/p(0.5)~1.28 times more common than 0.5.

As usual, the preferred solutions are the elegant ones. Go!

note: please post your solutions in the comments, using [ python]...[ /python] tags (but without the spaces in the tags).

LRU cache solution: a case for linked lists in Python

Sunday, January 13th, 2008

The reason I put up the LRU cache challenge up, was that I couldn't think of a good solution to the problem without using linked lists. This has been pointed to by Adam and Erez as well. Adam commented on this, and Erez' solution to the problem was algorithmically identical to mine.

So how to solve the challenge? Here are the two possible solutions I thought about:

  • Use a dict for lookup, and each element's age is indicated by its position in a linked list. This is the solution Erez and I implemented.
  • Keep a 'last time of use' indicator for each element. This could be just a regular int, incremented by 1 for each lookup. Keep the elements in a min heap, and when there are too many elements, pop them using the minimum heap.

Generally, I consider the first solution more elegant. It doesn't rely on an integer to work, so it could work 'indefinitely'. Of course, the second solution can be also made to work indefinitely, with some upkeep from time to time. (The added time cost of the upkeep may be amortized over other actions.)

If you can think of some other, more elegant solution, I'll be happy to hear about it.

So, given that a linked list solution is more elegant, we come to the crux of the problem: what to do in Python? The Python standard library does not contain a linked list implementation as far as I know. As a result, Python programmers are encouraged to use the list type, which is an array. This is just as well: for most intents and purposes, the list type is good enough.

I tried to think a little about other cases where a linked list was more appropriate, and I didn't come up with any more such cases. If you come up with any such case, I'll be happy to hear about it.

After looking for a public implementation, and not finding one that seemed good enough, I decided to go ahead and write my own.

Out of curiousity, I also did a small comparison of runtime speeds between my implementation of a linked list, and the list data type. I tried a test where a linked list has an obvious advantage (complexity wise)- removing elements from the middle of the list. The Python list won up to somewhere in the thousands of elements. (Of course, list is implemented in C, and mine is in pure Python).

What is my conclusion from all of this? The same as the conventional wisdom: use the list data type almost always. If you find yourself in need of a linked list, think long and hard (well, not too long) about your problem and solution. There's a good chance that either you can use the built-in list with an equivalent solution, or that a regular list will still be faster, for most cases. Of course, if you see no other way - do what you think is best.

Your thoughts?