Here are the top 30 “base modules”, ordered by number of PyPI projects importing them. These results are based on 11,204 packages download from PyPI. Explanations, full results and code to generate them are available below.
Results
Full results are available (see Methodology to understand what they mean exactly).
Discussion
Some interesting tidbits and comparisons:
- It seems django has gained “some popularity”. Zope is very high up on the list, and plone is at 42 with 907 projects importing it.
- The number of projects importing unittest is somewhat depressing, especially relative to setuptools which is impressive.
That might be because setuptools is somewhat a prerequisite to appear on PyPI (practically speaking), while unittest is not.(Edit: corrected by Michael Foord in a comment) - optparse with 1875 vs. getopt with 515.
- cPickle with 690 vs. pickle with 598.
- simplejson with 760 vs. json with 593.
I invite you all to find out more interesting pieces of information by going over the results. I bet there’s a lot more knowledge to be gained from this.
Background
Back in 2007 I wrote a small script that counted module imports in python code. I used it to generate statistics for Python modules. A week or two ago I had an idea to repeat that experiment – and see the difference between 2007 and 2011. I also thought of a small hypothesis to test: since django became very popular, I’d expect it to be very high up on the list.
I started working with my old code, and decided that I should update it. Looking for imports in Python code is not as simple as it seems. I considered using the tokenize and parser modules, but decided against that. Using parser would make my code version dependent and by the time I thought of tokenize, I had the complicated part already worked out. By the complicated part I mean of course the big regexps I used ;)
Methodology
Input: PyPI and a source distribution of the Python2.7 standard library. I wrote a small script (cheese_getter.py) to fetch python modules. It does it by reading the PyPI index page, and then using easy_install to fetch each module. Since currently there are a bit less than 13k modules in PyPI, this took some time.
Parsing: I wrote a relatively simple piece of code to find “import x” and “from x import y” statements in code. This is much more tricky than it seems: statements such as “from x import a,b”, “from . import bla” and
from bla import \ some_module\ some_module2 |
should all be supported. In order to achieve uniformity, I converted each import statement to a series of dotted modules. So for example, “import a.b” will yield “a” and “a.b”, and “from b import c,d” will yield “b”, “b.c”, and “b.d”.
Processing: I created three result types:
- total number of imports
- total number of packages importing the module
- total number of packages importing the module, only for the first module mentioned in a dotted module name, e.g. not “a.b”, only “a”.
I believe the third is the most informative, although there are interesting things to learn from the others as well.
Code: Full code is available. Peer reviews and independent reports are welcome :)
Interesting analysis. setuptools is in no way a pre-requisite to appear on pypi – you can upload to pypi with a vanilla distutils setup.py. In several of my projects I have *optional* setuptools support, so they will appear to use setuptools even though it isn’t the “recommended” way to install my projects. :-)
My mistake, I will mark it as such. Thanks for the correction :)
Two remarks :
1/ pkg_resources is part of the Setutpools or Distribute project, so if you cumulate pkg_resources + setuptools, it’s #1.
2/ I would separate all imports in setup.py, to make a distinction between the modules that are imported to package the project and the ones that are used within the project’s code.
I will try running it with special handling of setup.py.
Ditto with simplejson, many projects will try simplejson, then fall back to json, and then possibly to a bundled copy.
It seems that many projects don’t even bother with json and just try simplejson.
great post!
I wonder if zope would be so high on the list if you’d filter out inactive packages.
I might try inactive package filtering… An inactive package is one that hasn’t been updated in more than a year? That might work.
About zope – I think you need distinct zope and zope.interface. Last one doesnt mean, that zope used in project.