Categories
Python

PyWeb-IL Presentation on Harvesting: Finding the Most Influential Artists

Yesterday I gave a presentation on harvesting to the PyWeb-IL group. In the presentation, I described what I learned about harvesting and also gave a concrete example of how to find the “most influential artists” using data from allmusic.com and a (very) naive implementation of PageRank.

The PageRank implementation was based on wikipedia word-by-word, and is not efficient, but it works well enough for this presentation. I included it and the allmusic.com example mostly because I thought the results are pretty cool, and it’s very good teaching material.

Here is the presentation, and the code is available here.

Here is how to run it:

D:\work\pywebil-harvesting\upload>allmusic.py "/cg/amg.dll?p=amg&sql=11:3pfrxqq5ld6e" 2 out.pkl

simple_pagerank.py out.pkl

Happy harvesting!