Challenges Programming Python Statistics

Small programming challenge No. 6 – nblocks

I came up with this challenge when I had to write a function to divide a sequence to percentiles. I needed this to calculate some statistics over trips for “This sounds trivial” I thought, and reached for my simple blocks function:

def blocks(seq, block_len):
    """blocks(range(5),2) -> [[0, 1], [2, 3], [4]]"""
    seq_len = len(seq)
    if seq_len%block_len == 0:
        num_blocks = seq_len/block_len
        num_blocks = 1 + (seq_len/block_len)
    result = [[] for i in xrange(num_blocks)]
    for idx, obj in enumerate(seq):
    return result

So I set block_len to len(seq)/10 and called blocks(seq, block_len). Unfortunately, according to the docs of blocks (which I wrote…), when there is extra data, a “partial” block is added – which is exactly what we don’t want when calculating percentiles.
Instead, the behavior we want is nblocks(seq, number_of_blocks), for example: nblocks(range(10), 3) -> [0, 1, 2], [3, 4, 5], [6, 7, 8, 9].

This is a surprisingly hard to write function, or rather, harder than you’d expect. I’ll be especially glad if someone writes it elegantly. My solution works well enough, but isn’t the prettiest.

So, you have the definition – let’s see if you can do better than me. Once enough solutions are presented, I will present my own.

IMPORTANT EDIT: the required signature is nblocks(seq, num_blocks). So for seq(range(10), 3), the return value should be 3 blocks, with the last one having an extra item. As a general rule, the extra items should be spread as evenly as possible.

Challenges computer science Programming

Small Programming Challenge no. 5 – Generating a Permutation

I thought of this one quite a long time ago, and I believe that the idea behind it is pretty nice mathematically. I got the idea for it from Knuth’s “The Art of Computer Programming”.

The challenge is simple:
write a function that receives as arguments two numbers, n, and num such that 0 <= num < n!. This function needs to return an array (list) representing a permutation of the numbers 0..n-1. For each possible num, the function needs to return a different permutation, such that over all values of num, all possible permutations are generated. The order of permutations is up to you. The function you write should do this in at most O(n) time & space (Various O(nlogn) are also acceptable). Write your solutions in the comments, in [ LANG ] [/ LANG ] blocks (without the spaces) where LANG is preferably Python :). I will post my solution in a few days. As usual, the most efficient & elegant solution wins. Go!

Challenges computer science Python

My solution to the counting sets challenge

A few days ago, I wrote up a challenge – to count the number of sets a given set is contained in.

In the comments, I touched briefly on the original problem from which the challenge was created, and I’ll describe it in more depth here.
In the problem, I am given an initial group of sets, and then an endless ‘stream of sets’. For each of the sets in the stream, I have to measure its uniqueness. relative to the initial group of sets. A set that is contained in only one set from the initial group is very unique, one that is contained in ten – not so much.

So how to solve this problem? My original solution is somewhat akin to the classic “lion-in-the-desert” problem, but more like the “blood test” story. I didn’t find a link to the story, so I’ll give it as I remember it.

In an army somewhere, it was discovered that at least one of the soldiers was sick and so had to be put in isolation until he heals. It is only possible to check for the disease via a blood test, but tests are expensive, and they didn’t want to test all of the soldiers. What did they do?

They took enough blood from each soldier. Now, from each sample they took a little bit, and divided the samples into two groups. They mixed together the samples of each group, and tested the mixed sample. If the sample was positive – they repeated the process for the blood samples of all the soldiers in the matching group.

Now my solution is clear: let’s build a tree of set unions. At bottom level will be the union of couples of sets. At the next level, unions of couples of couples of sets. So on, until we end up with just two sets, or even just one – if we are not sure the set is contained in any of the initial sets.

Testing is just like in the story. We’ll start at the two biggest unions, and work our way down. There is an optimization though – if a set appears more than say, 10 times, it’s not very unique, and its score is zeroed. In that case, we don’t have to go down all the way, but stop as soon as we pass the 10 “positive result” mark.

Here’s the code:

class SetGroup(object):
    def __init__(self, set_list):
        cur_level = list(set_list)
        self.levels = []
        while len(cur_level) > 1:
            cur_level = [union(couple) for couple in blocks(cur_level, 2)]
    def count(self, some_set, max_appear = None):
        indexes = [0]
        for level in self.levels:
            indexes = itertools.chain((2*x for x in indexes), (2*x+1 for x in indexes))
            indexes = (x for x in indexes if x < len(level))
            indexes = [x for x in indexes if some_set <= level[x]]
            if max_appear is not None and len(indexes) >= max_appear:
                return max_appear
        return len(indexes)

Here’s a link to the full code.

I didn’t implement this solution right away. At first, I used the naive approach, of checking against each set. Then, when it proved to be too slow, I tried implementing the solution outlined by Shenberg and Eric in the comments to the challenge. Unfortunately, their solution proved to be very slow as well. I believe it’s because some elements appear in almost all of the sets, and so computing the intersection for these elements takes a long time.
Although originally I thought that my solution would suffer from some serious drawbacks (can you see what they are?), the max_appear limit removed most of the issues.

Implementing this solution was a major part of taking down the running time of the complete algorithm for the full problem I was solving from about 2 days, to about 15-20 minutes. That was one fun optimizing session :)

Challenges computer science

Small Python Challenge No. 4 – Counting Sets

This is a problem that I encountered a short while ago. It seems like it could be easily solved very efficiently, but it’s not as easy as it looks.
Let’s say that we are given N (finite) sets of integers – S. For now we won’t assume anything about them. We are also given another set, a. The challenge is to write an efficient algorithm that will count how many sets from S contain a (or how many sets from S a is a subset of).

Let’s call a single test a comparison. The naive algorithm is of course checking each of the sets, which means exactly N comparisons. The challenge – can you do better? When will your solution outperform the naive solution?

I will give my solution in a few days. Submit your solutions in the comments, preferably in Python. You can write readable code using [ python ] [ /python ] blocks, just without the spaces.

Assembly Challenges Programming Testing

Some Assembly Required No. 1

I’ve been working on some of the instruction tests in vial, and I wanted to test the implementation of LOOP variants. My objective was to make sure the vial version is identical to the real CPU version (as discussed here). To achieve this, I had to cover all of the essential behaviors of LOOP.

Well, using the framework Gil and I wrote, I hacked up some code that should cover the relevant cases:

code_template = """
mov edx, ecx ; control the start zf
mov ecx, eax ; number of iterations
mov eax, 0 ; will hold the result, also an iteration counter
    cmp eax, ebx    ; check if we need to change zf
    setz dh
    xor dh, dl      ; if required, invert zf
    inc eax         ; count the iteration
    cmp dh, 0       ; set zf
    loop%s loop_start
for loop_kind in ['','z','nz']:
    code_text = code_template % loop_kind
    c = FuncObject(code_text)
    for start_zf_value in [0,1]:
        for num_iters in [1,4,10]:
            for when_zf_changes in [1,2,15]:
                c(num_iters, when_zf_changes, start_zf_value)

Note that c(…) executes the code both on vial’s VM, and on the real cpu. c.check() compares their return value (EAX) and flags after the execution. I also wanted to avoid other kinds of jumps in this test.

To check that the code ran the same number of times, I returned EAX as the number of iterations.
All the games with edx are there to make sure that I’m testing different zf conditions.

The challenge for today:
Can you write a shorter assembly snippet that tests the same thing?

computer science Math Programming Python Statistics Utility Functions

Solution for the Random Selection Challenge

A few days ago, I wrote up two small Python Challenges. Several people have presented solutions for the first challenge, and I also posted my solution in the comments there.

However, the second challenge remained unsolved, and I will present a solution for it in this post.

Challenges Math Programming Python

Small Python Challenge No. 3 – Random Selection

This time I’ll give two related problems, both not too hard.

Lets warm up with the first:

You have a mapping between items and probabilities. You need to choose each item with its probability.

For example, consider the items [‘good’, ‘bad’, ‘ugly’], with probabilities of [0.5, 0.3, 0.2] accordingly. Your solution should choose good with probability 50%, bad with 30% and ugly with 20%.

I came to this challenge because just today I had to solve it, and it seems like a common problem. Hence, it makes sense to ask ‘what is the best way?’.

The second problem is slightly harder:

Assume a bell shaped function p(x) that you can ‘solve’. This means that given a value y, you can get all x such that p(x)=y. For example, sin(x)^2 in [0,pi] is such a function. Given a function such as Python’s random.random() that yields a uniform distribution of values in [0,1), write a function that yields a distribution proportional to p(x) in the appropriate interval.

For example, consider the function p(x) = e^(-x^2) in [-1,1]. Since p(0) = 1, and p(0.5)~0.779, the value 0 should be p(0)/p(0.5)~1.28 times more common than 0.5.

As usual, the preferred solutions are the elegant ones. Go!

note: please post your solutions in the comments, using [ python]…[ /python] tags (but without the spaces in the tags).

Algorithms Challenges computer science Programming Python Utility Functions

Small Python Challenge No. 2 – LRU Cache

Caching is easy. Consider the cache I used to optimize the recursive spring:

class _NotInDict(object):
_NotInDict = _NotInDict()
def cached(func):
    cache = {}
    def wrapper_func(*args):
        prev_result = cache.get(args, _NotInDict)
        if prev_result is _NotInDict:
            result = func(*args)
            cache[args] = result
            return result
        return prev_result
    return wrapper_func

This kind of cache is simple and effective (especially for recursions), and may be used in all sorts of situations. However, sometimes you want a size limited cache. In that case you have to decide on the criterion used to decide which items to throw away. There are many kinds of criteria used, for further reading check out wikipedia.

For now I’d like to discuss the LRU cache though. LRU stands for Least Recently Used, which means that you throw away the items you didn’t use for a long time. Time in this case is measured by actions. I thought of this type of cache when I worked on the recursive spring. Since each step in the ‘recursivation’ used two samples of the previous step, caching was an obvious choice, and if I had to size limit my cache, LRU whould be the type of cache to use, as you could be certain that the older samples would not have to be used until the next drawing.

The challenge for the weekend is to write an LRU cache in Python. The cache has to be general – support hash-able keys and any cache size required. It has to be efficient – in the size of the cache and the time it takes for a lookup and an update. Once the standard requirements have been met, the big competition should be on elegance. I wrote a benchmark implementation, which is efficient, but not fast. Once I see some solutions, I’ll talk a bit about mine, which is an interesting case-study.

Challenges computer science Programming Philosophy Python

A classic programming challenge, in Python

It has become a tradition for computer scientists to create various self referential ‘strange loops’. Traditions such as writing a compiler in the language it compiles are actually quite useful – and also very interesting. This tradition also branched to another one (also mentioned in the linked article) of writing programs that output their own source (without disk access and other dirty tricks).

The challenge is obviously to write such a program in Python, in as few lines as possible. Here is my solution, which is at two lines. I urge you to try it for yourself before looking, it is a very educating challenge. I’ll be very much interested in seeing a one-liner for this problem, or a proof that such a one-liner does not exist.

If you are interested in the bigger challenge, of writing an interpreter for Python in Python itself, go check out PyPy first.

For those interested in other ‘strange loops’, find a copy of ‘Godel Escher Bach’. If you happen to live in Israel, and can come to Haifa, I might even lend you my copy (once I get it back :)

Challenges Programming Python

Small Python Challenge No. 1

I bet I already wrote that sometime earlier, but I found myself today writing the following function:

def blocks(seq, block_len):
    """blocks(range(5),2) -> [[0, 1], [2, 3], [4]]"""
    seq_len = len(seq)
    if seq_len%block_len == 0:
        num_blocks = seq_len/block_len
        num_blocks = 1 + (seq_len/block_len)
    result = [[] for i in xrange(num_blocks)]
    for idx, obj in enumerate(seq):
    return result

I am not satisfied with this implementation.
The challenge is to write it in a more elegant and short manner. Functions from Python’s standard modules are considered valid solutions.

Ready? GO!