Looking into system performance of an Oracle data warehouse

Introduction

This is the start of an ongoing investigation into system performance of an oracle 10.2 data warehouse being loaded . The database server has 2 real storage volumes (called dw-clear and dw-encrypt) and 1 virtual one (dw-encrypt-u) used to decrypt data on the fly. Most of the data and the i/o are on the dw-clear volume. System-data performance have been collected via sadc -d to capture per-device statistics. The data are then extracted using sadf -d filename -- -d -b -d. The summary is available here as a csv. It's a large table of block i/o stats, cpu stats and per-device i/o stats, suitable to be imported into R. The system characteristics are as follows.
  • Sun x4150 64GB RAM, 2x4 x5450, 1 4Gb/s QL2462 HBA with 2 ports.
  • 3 device-mapper devices, 2 using a round-robin multipath (v1, v2), 1 using an on-the-fly cipher to decode encrypted data (v3).
  • 3PAR S400 with 10k drives and 4Gb/s HBAs.
  • Out of the 64GB, 8GB are set aside as HugePages to serve as memory pages for the SGA.
The goal of this investigation is to understand what the bottleneck is in the processing and what can be done to remove it. Let's start with cpu utilization. [caption id="attachment_172" align="aligncenter" width="510" caption="Distribution of CPU time spent in userland when not idle"]
Media_httpscaleordief_vwdgl
[/caption] Not terribly loaded (I'm filtering out the long idle portions with user > 5. How about I/O? [caption id="attachment_177" align="aligncenter" width="510" caption="% of CPU spent waiting on IO"]
Media_httpscaleordief_fwbab
[/caption] Interesting, iowait is not negligible. Is it correlated to anything in particular? First of all, let's see how iowait varies with device utilization of v1.
Media_httpscaleordief_ihhfi
v1 is slowly but surely bringing iowait higher, to the point than more than one processor ends up waiting on I/O. To be continued...

Blog battle on the storage appliance front

Backblaze has started an interesting conversation by detailing how they get to $117,000 per PB, down to the type and number of SATA card used in their design. A great PR move for a company in the crowded personal backup space. Of course publishing comparisons with Dell, Sun, NetApp and EMC at 8x, 10x, 30x the price is a sure way to start stirring people's emotions. The first to publish a lengthy response (that StorageMojo could find) is Joerg Moellenkamp in a blog post. Laudable in pointing design flaws for fundamentally 2 different markets. Sure, Sun's hardware is a great piece of engineering, squarely aimed at the enterprise market. Which, incidentally, is not buying in droves and Sun's financials is clearly reflecting that. Backblaze took the google route for storage and it's hard to see, given the competitive pressure, how they would be better off spending their margin on Sun hardware. The era of gold-plated hardware is slowly drawing to a close and I can't say I oppose that change.

Discounted sun servers

Not that I enjoy ads but suddenly sun feels like newegg. I suppose part of the deal is to get the tech blogosphere excited about their discounting their servers -- or getting rid of extra inventory before a revamping of the product line. If sun were apple we'd have swarms of excited bloggers looking at the faintest hint of product updates. Then again if sun were apple its stock would be much hotter.

Sun 2200M2 + Apple XserveRaid

This should prove a worthy Oracle 10g combination. We plan to use it for development purposes. The Sun server is a nice little 1U (specs here), connected via an HBA to 10.5TB of raw storage from Apple. I'll post a quick review shortly. For the real thing we use a Sun x4600, a great machine to run Oracle on Linux. Don MacAskill of SmugMug beat me to it and published an interesting review of these 2200M2.