First Steps with the Netflix API


Last weekend I started to dig into the Netflix API.  The power of their API is phenomenal. Pretty much everything you do on the Netflix site, you can do with their API:

  • Search and browse the catalog (including movies, series, and cast)
  • Get recommendations
  • Get ratings and predicted ratings
  • View, add to, or remove from user queues

This API is a great move by Netflix: they get developers engaged in extending functionality, plus they have an affiliate program so everybody wins :)  And there’s almost no business risk to Netflix.  Most of the functionality is only available if a user is already a Netflix customer, and the functionality that’s available without a subscription only helps expose and upsell Netflix products.

There’s plenty of great resources available about getting started with the API, but I thought I’d explain some of the early challenges I’m encountering and the solutions I’ve got so far.  Just FYI, I’m about 12 hours into my project, so I might be missing some obvious solutions :)

OAuth

Wow, OAuth is really daunting at first.  Netflix has a great walkthrough, but it’s still confusing and scary. They list 3 kinds of requests:

  • Non-authenticated (content that they don’t really care who gets)
  • Signed Requests (content that they do care about, but isn’t user specific)
  • Protected Requests (content and interaction that is user specific)

The first type is really straightforward: just include your “consumer key” or access ID, application identifier, whatever you want to call it.  Nothing else to say here.

The second is also straight forward.  It’s just a signed request similar to Amazon S3 or the similarly inspired SEOmoz API.  Basically you take your request, including query parameters, and compute a hash using a secret key you and Netflix share.  This way Netflix can be sure that it’s really you who’s doing the request.  This is pretty standard OAuth stuff, but I threw together a few lines of code to help.

The third kind of request, a “Protected Request,” is simple.  You’ll make requests like signed requests, but you use a user-specific key.  And to get that, you need to follow some really complicated authorization steps.  To illustrate, here’s a diagram of what’s going on:

9-Way Netflix Protected Request Handshake

That’s a 9-way handshake including you, the user, and Netflix!  But it’s safe, and extremely explicit.  So security for the win, I guess.  What’s going on is:

  1. You kick things off with a request to Netflix
  2. Netflix responds with a few security parameters, including a special login URL for the user
  3. You send the user that login URL (maybe with a 302 redirect), adding a callback URL parameter (see step 7)
  4. The user visits the login screen
  5. Netflix tells the user the dangers of playing with strange apps
  6. The user confirms
  7. Netflix redirects (with a 302) the user back to your callback URL, plus a user-specific authorized token (this is the first time anything has been user identifying)
  8. You make a final request to Netflix including that authorized token.  This time notice you’re using a new key that combines your secret key with the new token from Netflix.
  9. Netflix responds with a final secret token

Those last two tokens (the ones you got in steps 7 and 9) are the real keys to accessing user-specific data. Again, I’ve got a couple of pieces of code to help.  The first handles steps 1-3; the second handles steps 7-9.  It’s up to your users to handle steps 4-6.

Once you get past the 9-way handshake and get a good OAuth lib to help out with signing requests, the rest is pretty easy.  Mostly.

Rate Limits and Title Refs

I know a little bit about rate limits.  And not surprisingly, Netflix has them.  When you first sign up, you’re limited to 4 queries per second with a daily limit of 5000 queries overall.  That’s enough to get started.  And many requests support a batched interface.  So you can get up to 500 predicted ratings in one request (way to go Netflix!)

But because everything in the Netflix API depends on internally assigned, opaque URIs (e.g., http://api.netflix.com/catalog/titles/movies/60021896), I find myself making a lot of search queries.  For instance, If know that Shutter Island opened last weekend, and want to get some Netflix data about it, I first have to make a search request on “Shutter Island” before I can get that predicted rating.  And the search API doesn’t support a batched interface.  This adds up to a lot of requests, and a lot of requests quickly.

Perhaps there’s an easy solution I’m missing (download the whole catalog? But what about ambiguities in movie titles?).  This does sound like a problem well suited to caching.  At least, in my application, I have a few movies for which I want to look up ratings for many people.  Caching is commonly recommended by API providers.  So even if I am missing something, caching isn’t a bad idea.  Inspired by WP-cache, I’ve started a small disk-based caching utility.  It’s not done yet, but it works in my prototype :)

Speaking of my prototype, a super, super early version can be seen here.  I’ll post again when I’ve got updates to the code or the application itself.

  1. #1 by Nathan on February 22nd, 2010

    Very nice writeup, thanks for sharing! now I kinda want to go and play with it.

    nate

  2. #2 by Carter Cole on February 26th, 2010

    ive had to mess with oAuth pulling Google data feeds… its way cool that so many providers now offer oAuth and that the days of millions of passwords for each site are going the way of the dinosaurs

  3. #3 by Julio Reguero on April 13th, 2011

    Excellent article Nick. Thanks for sharing your knowledge with us.

(will not be published)


  1. No trackbacks yet.