While working with Gil Dabach on Distorm3, I found out that I’ve been missing a lot of utility functions. I’m going to write about some of them now.
def classify_to_dict(seq, key_func): result = {} for item in seq: key = key_func(item) if key in result: result[key].append(item) else: result[key] = [item] return result |
This is really a simple function, but a very powerful one. Here is a simple demonstration:
>>> base_tools.classify_to_dict(range(5), lambda k: k%2) {0: [0, 2, 4], 1: [1, 3]} |
Note the similarity and difference from groupby
.
If any of you can suggest a better name then classify_to_dict (maybe just classify?), I’ll be happy to hear it.
Some other very useful functions include arg_min
and arg_max
, which respectively return the index of the maximum and minimum element in a sequence. I also like to use union
and intersection
, which behave just like sum
, but instead of using the + operator, use the | and & operators respectively. This is most useful for sets, and on some rare occasions, for numbers.
Another function I like (but rarely use) is unzip
. I know, I know, paddy mentioned it is pretty obvious, and I know that it could just as well be called transpose, however, I still find using unzip(seq) a better choice then the much less obvious zip(*seq). Readability counts.
What are your favorite utility functions?
A minor update: I’ve incorporated the use of a syntax highlighter now, and it should be enabled for comments as well by the next challenge (which will be soon enough).
I would choose index_of_max over arg_max. I see no connection to arguments. Same with min.
I think you could pick a better example to show the strength of classify_to_dict.
Like:
>>> classify_to_dict( [“a”,”bcd”,”def”,”qwerty”], len )
{1: [‘a’], 3: [‘bcd’, ‘def’], 6: [‘qwerty’]}
Or:
>>> classify_to_dict( [1, ‘hello’, (2,3)], type )
{: [1], : [‘hello’], : [(2, 3)]}
Also, I think implementation would be better if you caught KeyError instead of checking __contains__.
Heh, ruined by the use of HTML symbols. Just run the last example and see :-)
You are right that for most people index_of_max is better and more meaningful then arg_max. arg_max however is the name of the mathematical function that does the same thing. Readability counts – you win :)