Archive for February, 2010

First Steps with the Netflix API

Last weekend I started to dig into the Netflix API.  The power of their API is phenomenal. Pretty much everything you do on the Netflix site, you can do with their API:

  • Search and browse the catalog (including movies, series, and cast)
  • Get recommendations
  • Get ratings and predicted ratings
  • View, add to, or remove from user queues

This API is a great move by Netflix: they get developers engaged in extending functionality, plus they have an affiliate program so everybody wins :)  And there’s almost no business risk to Netflix.  Most of the functionality is only available if a user is already a Netflix customer, and the functionality that’s available without a subscription only helps expose and upsell Netflix products.

There’s plenty of great resources available about getting started with the API, but I thought I’d explain some of the early challenges I’m encountering and the solutions I’ve got so far.  Just FYI, I’m about 12 hours into my project, so I might be missing some obvious solutions :)

OAuth

Wow, OAuth is really daunting at first.  Netflix has a great walkthrough, but it’s still confusing and scary. They list 3 kinds of requests:

  • Non-authenticated (content that they don’t really care who gets)
  • Signed Requests (content that they do care about, but isn’t user specific)
  • Protected Requests (content and interaction that is user specific)

The first type is really straightforward: just include your “consumer key” or access ID, application identifier, whatever you want to call it.  Nothing else to say here.

The second is also straight forward.  It’s just a signed request similar to Amazon S3 or the similarly inspired SEOmoz API.  Basically you take your request, including query parameters, and compute a hash using a secret key you and Netflix share.  This way Netflix can be sure that it’s really you who’s doing the request.  This is pretty standard OAuth stuff, but I threw together a few lines of code to help.

The third kind of request, a “Protected Request,” is simple.  You’ll make requests like signed requests, but you use a user-specific key.  And to get that, you need to follow some really complicated authorization steps.  To illustrate, here’s a diagram of what’s going on:

9-Way Netflix Protected Request Handshake

That’s a 9-way handshake including you, the user, and Netflix!  But it’s safe, and extremely explicit.  So security for the win, I guess.  What’s going on is:

  1. You kick things off with a request to Netflix
  2. Netflix responds with a few security parameters, including a special login URL for the user
  3. You send the user that login URL (maybe with a 302 redirect), adding a callback URL parameter (see step 7)
  4. The user visits the login screen
  5. Netflix tells the user the dangers of playing with strange apps
  6. The user confirms
  7. Netflix redirects (with a 302) the user back to your callback URL, plus a user-specific authorized token (this is the first time anything has been user identifying)
  8. You make a final request to Netflix including that authorized token.  This time notice you’re using a new key that combines your secret key with the new token from Netflix.
  9. Netflix responds with a final secret token

Those last two tokens (the ones you got in steps 7 and 9) are the real keys to accessing user-specific data. Again, I’ve got a couple of pieces of code to help.  The first handles steps 1-3; the second handles steps 7-9.  It’s up to your users to handle steps 4-6.

Once you get past the 9-way handshake and get a good OAuth lib to help out with signing requests, the rest is pretty easy.  Mostly.

Rate Limits and Title Refs

I know a little bit about rate limits.  And not surprisingly, Netflix has them.  When you first sign up, you’re limited to 4 queries per second with a daily limit of 5000 queries overall.  That’s enough to get started.  And many requests support a batched interface.  So you can get up to 500 predicted ratings in one request (way to go Netflix!)

But because everything in the Netflix API depends on internally assigned, opaque URIs (e.g., http://api.netflix.com/catalog/titles/movies/60021896), I find myself making a lot of search queries.  For instance, If know that Shutter Island opened last weekend, and want to get some Netflix data about it, I first have to make a search request on “Shutter Island” before I can get that predicted rating.  And the search API doesn’t support a batched interface.  This adds up to a lot of requests, and a lot of requests quickly.

Perhaps there’s an easy solution I’m missing (download the whole catalog? But what about ambiguities in movie titles?).  This does sound like a problem well suited to caching.  At least, in my application, I have a few movies for which I want to look up ratings for many people.  Caching is commonly recommended by API providers.  So even if I am missing something, caching isn’t a bad idea.  Inspired by WP-cache, I’ve started a small disk-based caching utility.  It’s not done yet, but it works in my prototype :)

Speaking of my prototype, a super, super early version can be seen here.  I’ll post again when I’ve got updates to the code or the application itself.

2 Comments

What and How to Measure Performance

Last week I wrote about performance testing Open Site Explorer.  But I didn’t write much about how and why to collect the relevant data.  In this post I’ll write about the tools I use to collect performance data, how I aggregate it, and little bit about what those data tell us.  This advice applies equally well when running a performance test or during normal production operations of any web application.

I collect three kinds of data:

  • system performance characteristics
  • client-side, perceived performance
  • server-side errors and per-request details

To make this a little bit more concrete, consider a pretty standard web architecture:

web architecture including measurement ideasI’ve highlighted where to include the three categories of measurement.

System Characteristics

System characteristics are the lion’s share of performance measurement.  I want to know how my app is performing and what the bottlenecks are.  I can’t do much better than actually measuring the raw components.  On each system in your architecture you’ll want to collect at least the following:

  • Load average
  • CPU, broken out by process including I/O wait time, user/system time, idle time
  • Memory usage, broken out by process, and used, cached, free
  • Disk activity, including requests and bytes read/written per second
  • Network bytes read/written per second

Make sure you understand what each of these does and does not measure.  For instance, load average may include network and disk wait, even if the CPU is idle.  But it might not.  Unused memory isn’t useful, but disk cache (often reported as unused) is useful.  So check how your OS and your tools calculating these things.

I do lots of analysis on this kind of data, but here are a few basic things to look at:

  • What’s your load average?  It’s (almost always) interpreted relative to the number of cores you have, so load average of 4 on a 4 core box probably means the box is saturated.
  • What does your memory usage look like?  Is free + cached memory very close to zero?  Most apps, daemons, etc. will work much better with a sizeable disk cache.  You don’t want to completely exhaust system memory or you’ll start swapping to disk, and that’s very bad.
  • Examine at least a week’s worth of data to get a sense for daily and weekly cycles.  Don’t tune your apps to operate optimally for weekend load; otherwise Monday morning will slam you worse than it normally does :P

And some more complicated things:

  • How much free (unused, non-cached) memory do you have?  How does this vary over time?  Tune your processes to use that free memory.  But keep enough (a small margin, perhaps 10% of total) in reserve for sudden spikes.
  • How does your total CPU usage compare to load average?  If you’ve routinely got a load average of 4 but your CPU usage is always under 50% (aggregated across all cores), then you’ve got some disk or network bottlenecks that aren’t letting you take advantage of all your cores.
  • Is your web server dumping nearly a MB/sec to disk during normal operations?  That could be some poorly tuned logging from apache or one of your applications.  Turn that chattiness down to get more performance.

To collect system performance data, I like collectdRRDToolDStat, and IOStat.  These are all simple and low-level tools.  But more importantly, I understand and trust them.  My ops guy, David, has been getting us on Zabbix which is a more full featured monitoring platform.  So check that out if that’s what moves you.

Collectd is both a system performance measuring agent and a central server to aggregate data from many nodes.  It’s important that you offload aggregation and recording of the data to a central server since this can be pretty disk intensive.  For instance, my data aggregation server is usually at 50% CPU I/O wait time due to writing all the perf data it collects.  Below is a sample configuration file to give you an idea of what collectd does as a data collection agent on a node, and how it’s configured:

#gets apache stats, needs mod_status enabled
LoadPlugin apache
#gets cpu stats broken out by core and aggregate
LoadPlugin cpu
#gets some disk statistics
LoadPlugin df
LoadPlugin disk
#gets load average
LoadPlugin load
#gets memory stats
LoadPlugin memory
#gets network stats
LoadPlugin interface
#gets system stats broken out by processes you specify below
LoadPlugin processes
#sends data back to a central ops server
LoadPlugin network

#your metrics aggregation server
<Plugin network>
	Server "ops.example.com" "27781"
</Plugin>

#measure the cpu usage of different processes
<Plugin processes>
	Process "apache"
	Process "ruby"
	Process "lighttpd"
	Process "memcached"
	Process "collectd"
</Plugin>

At the central aggregation server, collectd dumps its data to an RRDTool database.  RRDTool is a pretty well known, widely supported performance measurement storage format.  I don’t do much directly with RRDTool.  Instead I use drraw, a very light-weight web client for RRDTool.  drraw lets us quickly throw together arbitrary dashboards on my perf data.

drraw performance dashboard

Between collectd and drraw I collect, aggregate, and visualize all the measurements I listed above. But I also frequently collect finer grained, ad hoc data from boxes using DStat and IOStat.

DStat is a very versatile tool to collect pretty much any system metrics and display them in a very Linux-hacker interface:

dstat ad hoc performance measurement

I’ve asked for CPU, disk, memory, load average, and the most expensive I/O process.  It looks to me like:

  • One of my cores is pegged.
  • There’s nothing of note on disk or network.
  • Not much memory is free, but nearly 700MB is cached, so that looks good.
  • Xorg and “exe” (which is Flash player running Pandora) are talking to each other an awful lot, probably over pipes or local sockets (since there’s no corresponding disk or network)

One common problem I’ve got is that I see a lot of CPU I/O wait time, but only a few KB or maybe a MB/sec being written to disk.  The question is, where’s all that I/O wait time coming from?  It might be random disk I/O, or it might be network I/O.  That’s where IOstat comes in:

iostat ad hoc I/O performance measurement

I asked for extended information (-x) at 3 second intervals.  The first block of output is aggregated since system start.  Each block after that is aggregated over the 3 second interval.  This tells me:

  • The apps running are pushing between 1 and 10 write requests per second (the w/s column) (pretty low).
  • Those requests have to wait between 0 and 0.25 milliseconds to complete (the await column).
  • The disk has request response time of between 0 and 0.25 milliseconds (the svctm column).  This will always be less than or equal to await.  Because it’s equal to await in this case, that tells me there’s essentially no contention for the disk at the moment.
  • Most importantly, the disk is essentially at zero utilization (the %util column).

That about wraps it up for measuring performance on the server itself.  I’ve walked through a few scenarios.  But it’s a pretty complicated landscape.  The best thing you can do is to set up measurement and wait for stuff, good or bad, to happen.  After the fact you can match up what you saw from a business standpoint (what your users or support staff are telling you) with your performance data.  If things were reported by customers as being slow, did any of your perf graphs show spikes?  If you got a massive spike of traffic, did you see the effects on your system?  In the future you can use that experience to take action (add nodes, fix bugs, build better architecture) before any negative business impact occurs.

Client-Side Performance

In addition to measuring server-side performance, you get bonus points for putting together a (or many) synthetic client(s).  You’ll want to make sure your client can collect:

  • distribution of response times (or at least mean, median and 90%)
  • counts of successful (probably 200 OK) and failed (anything else) responses
  • throughput in total time to run a certain number of reports

The custom client I use collects all these things, plus more.  But there are plenty of tools and packages out there.  You can even set up a shell script that runs a simple curl or wget script:

$ /usr/bin/time curl --silent "http://www.nickgerner.com" 2>&1 | tail -n 2 | grep elapsed | sed 's/.* \([:.0-9]*\)elapsed .*/\1/'

0:00.79

This won’t tell you about render time or JavaScript time (unless you go with my suggestion of Keynote, but I’ve never used them); but it’s better than nothing.

Server-Side Errors

You’ll almost certainly uncover some errors under load.  You’ll want to make sure your application (and other server processes) have a reasonable amount of logging.  Debug logging could result in lots of unnecessary disk writes, so be sure to turn those off.  But it’s certainly okay to log errors for perf tests and in production.

It’s also a good idea to have Apache request logging, including timing turned on so you can see responses the server gave out, and the time to process them.  This will back up what you’re recording at the client.  I use the following log format (which should be compatible with lighttpd and Apache):

?%h %V %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{X-Forwarded-For}i %T

Throw some monitoring on this log.  I use monit.  But for performance analysis, a simple grep command does the trick:

$ cat /var/log/lighttpd/access.log | grep '" [5][0-9][0-9] '| wc -l
0

$ cat /var/log/lighttpd/access.log | grep '" 200 '| wc -l
434849

I hope that covers some basics, and the finer points around what to measure and how to measure performance in any web application.  The important point is to start collecting data.  The analysis of it comes in plenty of flavors and levels of complexity.  But the old 80-20 rule applies: just get started and you’ll quickly see benefits.

3 Comments