getFlix Revamped

2007-04-17 20:33 - Web

A while back, I googled around to find a script to save my Netflix ratings. I've got 539 of them. It's a perfect interface, when I'm renting Netflix DVDs. But when I'm not, it's horrible. I also knew, there was some interesting data about me locked up in those ratings, but I couldn't analyze it.

I quickly found a perl-and-python based set of scripts called getFlix. But, even then it was old and very broken. It wouldn't actually save anything, because the pages on Netflix have changed. Well, I've finally created my own replacement. I intended it to all be server-side in PHP, but I had all kinds of inexplicable trouble just getting it to log in. So, I went the easier route: I threw together a Greasemonkey script, to actually do the crawling. This left the server side bit terribly trivial. The back end script is only about 50 lines, there's just three mysql tables. And it works! For now; the parsing logic is super fragile.

So, on to the data! Some of it is pretty interesting. Along with the back end script are a handful of front end scripts, which run on jpgraph to make some pretty graphics. I chart the total ratings, by number of stars, and the average rating by a few different groups:

So, I've liked a lot, and disliked a lot. But, usually done pretty well picking movies. Most get four stars. The next graph, actors, isn't so interesting. It starts the average graphs, which include standard-deviation bars, to show the range of the values that were averaged. For actors especially, it's not so interesting. There's simply a lot of points, and enough got five stars that they just fill the graph. The decades graph is slightly more interesting. I've seen at least one film from every decade from the 1920s up to the 2000s. And, excluding the two outliers in the 30s, it's a nearly linear trend, I'm young, and I like young movies. More than older ones, at least.

How about directors? Peter Jackson made an easy first place, with all three of the Lord of the Rings trilogy counted towards him. What were the other three? The extended edition version. What follows is a variety of names I don't know at or above an average of 4 stars, probably a good place to look for more movies I like.

Genres is another interesting one. TV wins easy, because each show has a handful of seasons, and I've really only rated TV that I like. I already know which shows I like and which I don't, I've seen them on TV. They were more rated just to keep track of what I rented and what I didn't so far. Surprising, though, are numbers two and three: "Children & Family" and "Documentary". I'm surprised "Sci-Fi & Fantasy" is so low.

MPAA rating rounds out the group. After the genres graph, it's not so surprising. But, honestly, I expected R rated movies to be up higher.


Like what you see? You can get the code for yourself. But be careful: no warranty! It's generally insecure, so install it only on a private server, or behind a password. It's also likely to break, whenever Netflix changes their pages. And I don't plan on updating it. But it should serve you well for a while, if you're so inclined.

Comments:

Updated
2008-03-16 11:50 - arantius

I wanted to check out some new ratings, so I ran the script again today. It was broken, of course! I fixed it, and updated the copy at the 'get the code' link above.

Movies
2008-05-02 16:14 - Sam

I know you stated you weren't really updating the script, but I just figured on the off chance you'd look here: it seems to only be inputting the data from the first page of MoviesYouveSeen and nothing more. Cheers!

Updated version
2008-06-17 06:07 - maarten

Thanks so much for the example, Anthony!

I've taken your Greasemonkey script and enhanced it to include the IMDB movie ID as well. I also removed the stats collection, so now it runs entirely in Firefox; no server-side necessary.

Check it out at http://tenhanna.com/greasemonkey/

Another updated script
2013-06-17 09:41 - arantius

John Giguere contacted me by email with an updated script he has written: GetFlix3.user.js. He says it puts the data into a text box which you can copy/paste out of. I have never used it but you're welcome to try it.

Post a comment:

Username
Password
  If you do not have an account to log in to yet, register your own account. You will not enter any personal info and need not supply an email address.
Subject:
Comment:

You may use Markdown syntax in the comment, but no HTML. Hints:

If you are attempting to contact me, ask me a question, etc, please send me a message through the contact form rather than posting a comment here. Thank you. (If you post a comment anyway when it should be a message to me, I'll probably just delete your comment. I don't like clutter.)