Programming Python Utility Functions

Small Python Utility Functions

While working with Gil Dabach on Distorm3, I found out that I’ve been missing a lot of utility functions. I’m going to write about some of them now.

def classify_to_dict(seq, key_func):
    result = {}
    for item in seq:
        key = key_func(item)
        if key in result:
            result[key] = [item]
    return result

This is really a simple function, but a very powerful one. Here is a simple demonstration:

>>> base_tools.classify_to_dict(range(5), lambda k: k%2)
{0: [0, 2, 4], 1: [1, 3]}

Note the similarity and difference from groupby.
If any of you can suggest a better name then classify_to_dict (maybe just classify?), I’ll be happy to hear it.

Some other very useful functions include arg_min and arg_max, which respectively return the index of the maximum and minimum element in a sequence. I also like to use union and intersection, which behave just like sum, but instead of using the + operator, use the | and & operators respectively. This is most useful for sets, and on some rare occasions, for numbers.

Another function I like (but rarely use) is unzip. I know, I know, paddy mentioned it is pretty obvious, and I know that it could just as well be called transpose, however, I still find using unzip(seq) a better choice then the much less obvious zip(*seq). Readability counts.

What are your favorite utility functions?

A minor update: I’ve incorporated the use of a syntax highlighter now, and it should be enabled for comments as well by the next challenge (which will be soon enough).

3 replies on “Small Python Utility Functions”

I would choose index_of_max over arg_max. I see no connection to arguments. Same with min.

I think you could pick a better example to show the strength of classify_to_dict.
>>> classify_to_dict( [“a”,”bcd”,”def”,”qwerty”], len )
{1: [‘a’], 3: [‘bcd’, ‘def’], 6: [‘qwerty’]}

>>> classify_to_dict( [1, ‘hello’, (2,3)], type )
{: [1], : [‘hello’], : [(2, 3)]}

Also, I think implementation would be better if you caught KeyError instead of checking __contains__.

You are right that for most people index_of_max is better and more meaningful then arg_max. arg_max however is the name of the mathematical function that does the same thing. Readability counts – you win :)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.