Small Python Utility Functions

While working with Gil Dabach on Distorm3, I found out that I’ve been missing a lot of utility functions. I’m going to write about some of them now.

def classify_to_dict(seq, key_func):
    result = {}
    for item in seq:
        key = key_func(item)
        if key in result:
            result[key] = [item]
    return result

This is really a simple function, but a very powerful one. Here is a simple demonstration:

>>> base_tools.classify_to_dict(range(5), lambda k: k%2)
{0: [0, 2, 4], 1: [1, 3]}

Note the similarity and difference from groupby.
If any of you can suggest a better name then classify_to_dict (maybe just classify?), I’ll be happy to hear it.

Some other very useful functions include arg_min and arg_max, which respectively return the index of the maximum and minimum element in a sequence. I also like to use union and intersection, which behave just like sum, but instead of using the + operator, use the | and & operators respectively. This is most useful for sets, and on some rare occasions, for numbers.

Another function I like (but rarely use) is unzip. I know, I know, paddy mentioned it is pretty obvious, and I know that it could just as well be called transpose, however, I still find using unzip(seq) a better choice then the much less obvious zip(*seq). Readability counts.

What are your favorite utility functions?

A minor update: I’ve incorporated the use of a syntax highlighter now, and it should be enabled for comments as well by the next challenge (which will be soon enough).

This entry was posted in Programming, Python, Utility Functions and tagged . Bookmark the permalink.

3 Responses to Small Python Utility Functions

  1. Erez says:

    I would choose index_of_max over arg_max. I see no connection to arguments. Same with min.

    I think you could pick a better example to show the strength of classify_to_dict.
    >>> classify_to_dict( ["a","bcd","def","qwerty"], len )
    {1: ['a'], 3: ['bcd', 'def'], 6: ['qwerty']}

    >>> classify_to_dict( [1, 'hello', (2,3)], type )
    {: [1], : ['hello'], : [(2, 3)]}

    Also, I think implementation would be better if you caught KeyError instead of checking __contains__.

  2. Erez says:

    Heh, ruined by the use of HTML symbols. Just run the last example and see :-)

  3. lorg says:

    You are right that for most people index_of_max is better and more meaningful then arg_max. arg_max however is the name of the mathematical function that does the same thing. Readability counts – you win :)