CMG'09: Solaris/Linux Performance Measurement and Tuning (part 2)
Adrian Cockcroft (Netflix)
My notes:
- Netflix releases every 2 weeks, first in beta and tracks everything
- Everything at netflix (or in web-land in general) instrumented, in libraries so that instrumentation comes for free
- Beware of kernel tweaks, good for older kernels, now a lot more auto-tuned
- On Solaris, microstate data very useful
- With Poisson arrivals, steady state, N identical servers, approximation of response time, R = S / ((1 - utilization)^N), S = service time, utilization = throughput * S
- Issues with this simplistic model: bursted traffic, service time varies, N servers don't process the same thing, virtual hardware make it a lot harder to figure out
- Measurement errors (especially around measuring time)
- So don't bother about utilization
- Load average on linux is broken, it includes disk activity
- I/O wait is fundamentally broken, the cpu never waits for I/O per se
- Cockcroft Headroom Plots: 99th-%ile against response time
- On linux, best way to track i/o per process is with SystemTap