Velocity: Sean Quilan @google, Storage at scale

Strategy: buy lots of commodity hardware, because problems tend to be too big for their problem space. Hardware reliability is not that useful as well because it's expensive. [Showing the same pictures over and over again, someone from Google PR, please authorize the release of newer pictures] [A GFS description follows, nothing new so far, read the papers on the topic] [A BigTable description follows, same deal] I wish this talk had some new information...

Velocity: Luiz Barroso @google, efficient energy ops

Hypothetical energy cost extrapolations, 5 years from now, hardware could be only 20-50% of the total energy costs. Efficiency defined as computing speed divided by power. Can be broken down further (computing speed / power provided to chip x power provided to chip / power provided to server x power provided to server / power provided to data center).
  • Data center efficiency, PUE around 1.83, worse if data center is underutilized
  • Server energy efficiency, 25% dissipated by power supply
From uptime institute, 10-year energy costs, $9/W for consumption, $10-22/W for data center build out. Rough cost breakdown: 50% on hardware, 22% on energy, 28% on  data center (assumptions, dual socket x86, 4 year depreciation, 70% load at peak). How to be more efficient:
  1. consolidate workloads
  2. measure actual power usage rather than rely on nameplates
  3. investigate oversubscription
Oversubscription potential rises as the number of machines grows so oversubscribe at the data center level. Also mix workloads and be ready to kill instances if you get close to the limit. Source: Energy-proportional computing Consider a data center as a device (5,000 machines), distribution with 2 peaks, one at 5% utilization, another around 30%. Typical power efficiency of a typical server, a machine running at a load of 0.3 is at 60% power efficiency, while a fully loaded machine is at 100% power efficency, and sadly data center are very rarely at 100% as seen before. The idea behind energy-proportional computing: a generally proportional relation between work and power. Idleness in a server is scarce. It should happen at the electronics because in software it's much harder (think of kernel getting interrupts all the time). If you breakdown power by component, you find out that the CPU is much-more proportional than the rest of the components so even powering down the cpu the total savings are still between 10% and 20% of power gains. Still CPUs have 2 important power-usage features:
  1. wide dynamic power range (ram, disks and network devices remain in a much closer power range)
  2. active low-power modes, where the cpu can do things
People, which average around 120W, have a 20x dynamic power range, compared to a 2x of a PC. In conclusion, write fast code (biggest contribution to energy efficiency), consider reduction of all energy-related costs (provisioning), and demand energy-proportionality from equipment manufacturers. Plug: http://climatesaverscomputing.org

Cookie crumbles

On Monday's front page of the Financial Times one could read "Google resolve crumbles on 'cookies' pledge", an interesting piece on how earlier inquiries about the role of cookies in "behavioural targeting" had been gently pushed aside after the acquisting of DoubleClick had started, with the apparent benediction or at least indifference of regulatory bodies. As the paper puts it,
Some Google insiders say that as the company's understanding of "behavioural targeting" has grown, some of its earlier fears about cookies have turned out to seem simplistic, and it has become less clear that the practice raises big privacy concerns.
As much as I like Google's services and applications I find it disconcerting, to say the least, that the assessment about privacy cannot be clearly and publicly stated (and I doubt, though it is possible, that the paper would have not cited its sources if it could). And more importantly that this much needed assessment could not be conducted by an independent body. Protection of trade secrets I'm told. It is also for the sake of trade secrets that the "market" for online advertising is run without any real auditing of any kind. In other industries, even with "independent" auditors quite a few irregularities manage to sneak through (see Enron, Countrywide, etc.) so I can only imagine what skeletons we will find, in the closet of a company that won't let anyone look at how its main inventory is assessed, counted and verified. It is a true instance of self-regulation, back to the meaning of self. But hey, who can argue against a license to print a few billion dollars per quarter? Might is right, right?