2007-04-17 20:33 - Web
A while back, I googled around to find a script to save my Netflix ratings. I've got 539 of them. It's a perfect interface, when I'm renting Netflix DVDs. But when I'm not, it's horrible. I also knew, there was some interesting data about me locked up in those ratings, but I couldn't analyze it.
I quickly found a perl-and-python based set of scripts called getFlix. But, even then it was old and very broken. It wouldn't actually save anything, because the pages on Netflix have changed. Well, I've finally created my own replacement. I intended it to all be server-side in PHP, but I had all kinds of inexplicable trouble just getting it to log in. So, I went the easier route: I threw together a Greasemonkey script, to actually do the crawling. This left the server side bit terribly trivial. The back end script is only about 50 lines, there's just three mysql tables. And it works! For now; the parsing logic is super fragile.
So, on to the data! Some of it is pretty interesting. Along with the back end script are a handful of front end scripts, which run on jpgraph to make some pretty graphics. I chart the total ratings, by number of stars, and the average rating by a few different groups:
- Total by Stars
- Average by Actors
- Average by Decades
- Average by Directors
- Average by Genre
- Average by MPAA
So, I've liked a lot, and disliked a lot. But, usually done pretty well picking movies. Most get four stars. The next graph, actors, isn't so interesting. It starts the average graphs, which include standard-deviation bars, to show the range of the values that were averaged. For actors especially, it's not so interesting. There's simply a lot of points, and enough got five stars that they just fill the graph. The decades graph is slightly more interesting. I've seen at least one film from every decade from the 1920s up to the 2000s. And, excluding the two outliers in the 30s, it's a nearly linear trend, I'm young, and I like young movies. More than older ones, at least.
How about directors? Peter Jackson made an easy first place, with all three of the Lord of the Rings trilogy counted towards him. What were the other three? The extended edition version. What follows is a variety of names I don't know at or above an average of 4 stars, probably a good place to look for more movies I like.
Genres is another interesting one. TV wins easy, because each show has a handful of seasons, and I've really only rated TV that I like. I already know which shows I like and which I don't, I've seen them on TV. They were more rated just to keep track of what I rented and what I didn't so far. Surprising, though, are numbers two and three: "Children & Family" and "Documentary". I'm surprised "Sci-Fi & Fantasy" is so low.
MPAA rating rounds out the group. After the genres graph, it's not so surprising. But, honestly, I expected R rated movies to be up higher.
Like what you see? You can get the code for yourself. But be careful: no warranty! It's generally insecure, so install it only on a private server, or behind a password. It's also likely to break, whenever Netflix changes their pages. And I don't plan on updating it. But it should serve you well for a while, if you're so inclined.