Here are the top 30 “base modules”, ordered by number of PyPI projects importing them. These results are based on 11,204 packages download from PyPI. Explanations, full results and code to generate them are available below.
Full results are available (see Methodology to understand what they mean exactly).
Some interesting tidbits and comparisons:
- It seems django has gained “some popularity”. Zope is very high up on the list, and plone is at 42 with 907 projects importing it.
- The number of projects importing unittest is somewhat depressing, especially relative to setuptools which is impressive.
That might be because setuptools is somewhat a prerequisite to appear on PyPI (practically speaking), while unittest is not.(Edit: corrected by Michael Foord in a comment)
- optparse with 1875 vs. getopt with 515.
- cPickle with 690 vs. pickle with 598.
- simplejson with 760 vs. json with 593.
I invite you all to find out more interesting pieces of information by going over the results. I bet there’s a lot more knowledge to be gained from this.
Back in 2007 I wrote a small script that counted module imports in python code. I used it to generate statistics for Python modules. A week or two ago I had an idea to repeat that experiment – and see the difference between 2007 and 2011. I also thought of a small hypothesis to test: since django became very popular, I’d expect it to be very high up on the list.
I started working with my old code, and decided that I should update it. Looking for imports in Python code is not as simple as it seems. I considered using the tokenize and parser modules, but decided against that. Using parser would make my code version dependent and by the time I thought of tokenize, I had the complicated part already worked out. By the complicated part I mean of course the big regexps I used ;)
Input: PyPI and a source distribution of the Python2.7 standard library. I wrote a small script (cheese_getter.py) to fetch python modules. It does it by reading the PyPI index page, and then using easy_install to fetch each module. Since currently there are a bit less than 13k modules in PyPI, this took some time.
Parsing: I wrote a relatively simple piece of code to find “import x” and “from x import y” statements in code. This is much more tricky than it seems: statements such as “from x import a,b”, “from . import bla” and
from bla import \ some_module\ some_module2
should all be supported. In order to achieve uniformity, I converted each import statement to a series of dotted modules. So for example, “import a.b” will yield “a” and “a.b”, and “from b import c,d” will yield “b”, “b.c”, and “b.d”.
Processing: I created three result types:
- total number of imports
- total number of packages importing the module
- total number of packages importing the module, only for the first module mentioned in a dotted module name, e.g. not “a.b”, only “a”.
I believe the third is the most informative, although there are interesting things to learn from the others as well.
Code: Full code is available. Peer reviews and independent reports are welcome :)