So how was your Yom Kippur? After Yom Kippur ended, I sat down to write something that was nagging me for quite some time. I wanted to see what pagerank had to say about movies. I’ve always liked to picture things as graphs, and looking at movies and actors as nodes in a graph is very intuitive. I also think that the idea has some merit. I like most of the movies that Edward Norton appears in, so it makes sense to give him a high score, and see which other movies Edward Norton “recommends”.
So I fired up my IDE, and started to work.
Well, at first I was a little bit disappointed, because IMDbPY doesn’t have code to read the IMDB charts. Well, I was going to extend it a little bit, but then I remembered that I already downloaded all the data I needed some time ago (more than half a year I believe), and that was good enough. So I opened up some old pickle file I had lying around, and Voila! after some playing around with the format, I had a weighted graph at my disposal. I weighted the graph according to the IMDB listing – if an actor appeared higher on the cast, his (or hers) weight to and from that movie will be higher.
I implemented a quick and dirty pagerank, and then added some code to score my recommendations higher. I did that by adding each iteration my recommendation’s weight to the movie’s score.
Without recommendations, the top spots were:
- 24
- The Sopranos
- The Wizard of Oz
- JFK
- Cidade de Deus
- Citizen Kane
- King Kong
- Voyna i mir
- Dolce vita, La
- The Patriot
After adding “Edward Norton” and “Tom Hanks” as recommendations, I got the following results:
- American History X
- Fight Club
- 24
- Red Dragon
- The Sopranos
- Catch Me If You Can
- The Wizard of Oz
- JFK
- Forrest Gump
- Cidade de Deus
Well, I was surprised a little to see 24 and the Sopranos rank so high, and also a little bit disappointed. The recommendations didn’t really add enough information, and more work will have to be done for this to work properly. It has the right ‘smell’ however. It looks like it has potential to be useful – I can just see the program, having a small table with a ‘seen/unseen’ checkbox, and a place to write your favorites, and the program will tell which movies you should watch next. This reminds me a little bit of the recommendations on soulseek and Amarok. It would be really fun to write a total recommendation engine, with the best of all worlds. Someone probably already implemented it.
In any case, I’ll work on it some more. This is interesting.
If anyone wants the code, tell me, and I’ll make it ‘online worthy’…
Hi – guess who I am and how I’ve found this blog post. ;-)
Doing some analysis of the IMDb data is something I always wanted to do; unfortunately I lack the needed knowledge of statistic to express a competent opinion. :-)
Besides that, I’m not sure that the plain text data files contain enough information to develop something useful (there are only average vote, number of votes and a rough vote distribution for each movie).
An approach different from your was tried by Tom Moertel; see:
http://blog.moertel.com/articles/2006/01/17/mining-gold-from-the-internet-movie-database-part-1
and
http://community.moertel.com/ss/space/IMDB+Movie-Rating+Decoder+Ring
I like the idea to use these data to extract movie recommendations, but my long time dream was another: analyze the data in the plain text data files and use these information to develop a way to predict how successful a still unreleased movie can be, based on the director, cast, genre (and how well director/cast has previously done with this genre), keywords and so on.
If you have any idea about it (if/how it can work) let me know via email; after all the data is already in a SQL database: it’s a pity that this wealth of information is not more heavily used.