Last night I encountered yet again one of Python’s annoyances.
The annoyance I’m referring to is the lack of string like functions for lists. Trivial examples include find() and rfind(). Before you mention index though, it’s important to point out that index() checks for equality. I’d be much happier if instead it could take a function argument for comparisons.
A less trivial example is split(), also with a possible criterion argument. A complicated example is regular expressions.
It seems that most of these functions, when applied to lists should take at least a function argument. Maybe regular expressions for lists would be better with a key argument though.
This reminds me a bit of C++’s generic algorithms for collections.
On a similar subject, it would have been nice, if along with heapq, bisect functions would receive a key argument.
Some of these are pretty trivial to roll your own. A generic find() for a list, for example:
from itertools import count, ifilter, izip
def list_find(L, func):
for i,d in ifilter(lambda x: func(x[1]), izip(count(), L)):
return i
return -1
Looks like the whitespace got eaten in the comment. Should be pretty easy to figure out though.
List is a more generic sequence than a String, which is why those methods don’t apply.
A String is a sequence of characters while a List is a sequence of objects. The concept of find() assumes a (sub)string comparison. That isn’t possible with a List, which may contain ints, floats, object instances, etc. However, if you know your list is all strings (or sequences)…
def find( alist, pattern):
return [x in alist if pattern in x][0]
def rfind( alist, pattern):
return [x in alist.reverse() if pattern in x][0]
you can replace the if clause with your custom function, as needed.
As for split(), the list is already delimited by it’s elementization, so….I’m not sure what you’re trying to do.
As for regex’s, are you regex’ing across the entire list? If so, do a .join() then apply the regex. If you are appling the regex to the elements in a list, then iterate it and apply the regex.
See http://bugs.python.org/issue4356 for a discussion on bisect module
General note:
For those wishing to write code in the comments, enclose your code within a [ python ] and [ / python ] block (just without the spaces :).
Bobby:
Of course, and indeed I wrote up a few of those and put them in my standard utilities import, which by now has grown quite a bit. By the way, instead of izip with count, why not just use enumerate?
Dave:
I disagree, although you are partially correct. Indeed when I thought of find and rfind what I had in mind was looking for a single element’s position (a bit like C’s strchr()). Consider:
[python]
def rfind(some_list, criterion, start_pos=None):
if start_pos is None:
start_pos = len(some_list)-1
for idx in range(start_pos, -1, -1):
if criterion(some_list[idx]):
return idx
return None
[/python]
Now, I admit I wanted these functions mostly for some ad-hoc parsing. Still, it would be nice to split a list on token types, or other delimiting objects. for example my_split([1,2,4,0,5,6,7,0,2,5,1], lambda x: x == 0) would return [[1,2,4,],[5,6,7],[2,5,1]].
Regarding regular expressions, again, I wanted this for parsing. This capability is somewhat supported in nltk, where you can define regular expressions for token types for chunks. See section 7.2, subsection “Tag Patterns” in http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html .
I just want a similar capability for lists. (as an aside, regular expression theory allows for any sequence of elements from an alphabet.)
Miki:
Thanks for the link, I wasn’t aware of that.
I want my money back.
First thing I thought when I saw the title on Reddit was “Oooh, shiny, I like a good rant”
Where’s the rant ?
You cannot rant in less than at least 6 paras (including > 40% ALLCAPs), or 10 paras in lower case.
2 substantial paras abd a few supportings paras does not qualify as a rant.
It’s os merely a whinge.