Tag Archives: Harvesting

PyWeb-IL Presentation on Harvesting: Finding the Most Influential Artists

Yesterday I gave a presentation on harvesting to the PyWeb-IL group. In the presentation, I described what I learned about harvesting and also gave a concrete example of how to find the “most influential artists” using data from allmusic.com and … Continue reading

Posted in Python | Tagged , , , , | 1 Comment

Easy Harvesting

Image by existentist. I’ve been doing a lot of harvesting (aka screen-scraping) lately. Fortunately, I don’t need forms automation, so I’m using urllib2 and not Mechanize like my friend Ron Reiter recommended. At first, when I wanted to get some … Continue reading

Posted in Programming, Python, Utility Functions | Tagged , , , | Leave a comment

Database Design Problem

A few weeks ago, I had to work out a database design for my startup. I had a bit of a hard time deciding on a design direction, but after thinking about it, I settled on a design I was … Continue reading

Posted in Databases, Design, startup | Tagged , , , , , , | 5 Comments

Harvesting with threadmap

From time to time, I need to harvest a website, or many websites. For example, to collect the data from IMDB to run the Pagerank algorithm. Other times I need to query some non-web servers. Usually in such cases, I … Continue reading

Posted in Programming, Python, Utility Functions | Tagged , , , , | 4 Comments