Scale or die http://www.scaleordie.com Tech posts posterous.com Sat, 28 May 2011 19:43:00 -0700 A sane answer to the eG8 in Paris http://www.scaleordie.com/a-sane-answer-to-the-eg8-in-paris http://www.scaleordie.com/a-sane-answer-to-the-eg8-in-paris

10 minutes of your life that won't go to waste: http://blip.tv/lawrence-lessig/keynote-e-g8-5205474

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 18 May 2011 19:18:00 -0700 Getting fast internet to an office in Manhattan: work from home http://www.scaleordie.com/getting-fast-internet-to-an-office-in-manhatt http://www.scaleordie.com/getting-fast-internet-to-an-office-in-manhatt

As Datadog is looking for new digs I've had the pleasure to spend half a day on the phone with various internet providers to upgrade the T1 that is already in place. In this day and age, a T1 is still brandished in the richest city of the world as some serious connectivity. Imagine, 1.5 megabit per second shared between phone lines and data lines. Back when 33.6 kilobit per second were all the rage, it made sense to show up at work to get online. But now I get 30Mb/s at home... So real estate developers and building management companies need to update their "I-sound-cool" tech lingo and understand that a T1 is not a strong selling point. Which brings me to the second point: getting internet service from providers. Choose from cable companies, phone companies, wireless providers and internet providers. I'll start with the cable companies.

Cable companies do not service every building, despite the fact that they have what turns out to be a decent offering ($300/month for 50 Mb/s down, 5Mb/s up). I called Time Warner and RCN and neither will service the building. Nevermind that as a consumer I'll be charged $80/month for the same bandwidth, I'm still kindly asked to look somewhere else.

Next stop: phone companies. Fiber-to-the-home (aka FIOS in Verizon jargon) would be more than adequate. But of course, no fiber in the building. The best I am offered is a 7 Mb/s assuming the office is not too far from the Verizon building. It's also only $90/month with a phone line, which at this point would only be used occasionally to send a fax.

Wireless providers are a bit more promising. There I have a choice between enterprise overpriced WiMax at $800/month for a paltry 8 Mb/s both ways, Verizon LTE at $10 per GB (and decent 10-15 Mb/s) or cheaper Clear(wire( consumer access at $50/month for 4-5 Mb/s.

Last came the internet providers proper, who seem to be milking smaller businesses by offering 10 Mb/s at a whopping $1,300 per month. Considering that it's a 17-stories building and that 1Gb/s should cost between $5k-10k per month, it should be possible to buy 1 access and split it across all tenants, rather than having 15 companies each pay $1,000 to selfishly enjoy 10 Mb/s.

Bottom line: if you want cheap internet, rent an apartment!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Fri, 11 Mar 2011 17:07:00 -0800 Ignite talks are not easy http://www.scaleordie.com/ignite-talks-are-not-easy http://www.scaleordie.com/ignite-talks-are-not-easy

Most recently I was attending devopsdays in Boston to get feedback on our work at datadoghq.com. The feedback was more than encouraging but I'll keep that for later. I had volunteered to give a short presentation on the outdated assumptions behind systems and application monitoring. The format is simple: 5 minutes, 20 slides auto-advancing every 15 seconds. I knew it was difficult but I failed to step back and think about how these constraints would affect my message.

The most obvious trap I fell into is that I felt compelled to use all 20 slides with different content rather than repeating the same slide over and over again. I could have gotten away with 3 slides rehashing the same idea rather than 20 slides trying to convey 5 major ideas. So that was my biggest mistake: trying to be too synthetic; too much content without giving the audience a chance to digest.

In hindsight I would have picked 1 idea and expounded it over the whole 5 minutes. After all in normal conversation it will easily take 5 minutes to convey one idea, complete with arguments and examples.

Sensing that I could have done a better job I went out looking for references on presentation and stumbled upon The Naked Presenter by Garr Reynolds. I highly recommend the first chapter. It reminds the reader that presentation is not about the tools or the format; it's about the audience and the message. Why is my audience here and how do I avoid wasting their time?

All that said it's always good to get a chance to improve.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 09 Mar 2011 15:18:00 -0800 Now running on posterous http://www.scaleordie.com/now-running-on-posterous http://www.scaleordie.com/now-running-on-posterous

I'm consolidating my blogs to posterous. Thanks for the ride, wordpress.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 15 Sep 2010 20:43:20 -0700 EC2 Micro-instances value analysis http://www.scaleordie.com/ec2-micro-instances-value-analysis http://www.scaleordie.com/ec2-micro-instances-value-analysis Evaluating Amazon’s EC2 Micro Instances at DocumentCloud. An interesting benchmark using image processing. For highly-parallel jobs this means using a ton of micro instances to get results for cheaper. To make this kind of decision with a simple API call did not exist 5 years ago... Now what we need is a simple way to make this kind of decisions easily. And that's what DataDog is about... More details soon.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Fri, 06 Aug 2010 23:49:26 -0700 Cassandra training with Jon Ellis from Riptano http://www.scaleordie.com/cassandra-training-with-jon-ellis-from-riptan http://www.scaleordie.com/cassandra-training-with-jon-ellis-from-riptan Riptano, a newly-formed venture now offers training and commercial support of Cassandra, a key-value store of Facebook lineage. Cassandra's initial claim to fame is being the data store behind facebook's inbox. The training session started with a relatively high-level presentation of Cassandra's data model before jumping quickly into some real code from Twissandra, a simplified twitter clone based on Django. From there we were introduced to super-columns and their limitations, i.e. their subcolumns are not indexed so one should not pack too much in a super-column. As the day progressed we started to get deeper into operations and internals where the rubber usually meets the road and Jon was obviously very well-acquainted with the subject matter. My suggestion would be to add more diagrams to the presentation materials to illustrate the numerous points made during the session. Overall, considering the relatively paucity of documentation on Cassandra Jon's in-depth session is a nice shortcut to spending time scouring mailing lists and reading the source code to get a solid grasp of the topic. In the context of DataDog we use Cassandra to persist reliably and with little latency all inbound signals. But I'll save details for later...

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 23 Jun 2010 03:14:29 -0700 Interesting EC2 DNS bug http://www.scaleordie.com/interesting-ec2-dns-bug http://www.scaleordie.com/interesting-ec2-dns-bug EC2's internal DNS servers don't get updates when you stop and restart EBS-backed instances. I came across this bug as I was trying to get the scala off-line compiler to work on a restarted instance. fsc uses java.net.InetAddress.getLocalhost(), which triggers a DNS call. After some time spent reading the code, a tcpdump session convinced me that the machine thought it was something else (at least at the DNS level). Call it split personality. To reproduce:
  1. start an EBS-backed instance
  2. note its name and its internal ip (uname -n, ip addr)
  3. stop and restart the instance
  4. its node name remains unchanged, its ip has changed, yet dig +short instance_dns_name returns the old IP, even hours after the restart
Annoying!

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Thu, 10 Dec 2009 16:42:46 -0800 CMG'09: Solaris/Linux Performance Measurement and Tuning (part 2) http://www.scaleordie.com/cmg09-solarislinux-performance-measurement-an http://www.scaleordie.com/cmg09-solarislinux-performance-measurement-an Adrian Cockcroft (Netflix) My notes:
  • Netflix releases every 2 weeks, first in beta and tracks everything
  • Everything at netflix (or in web-land in general) instrumented, in libraries so that instrumentation comes for free
  • Beware of kernel tweaks, good for older kernels, now a lot more auto-tuned
  • On Solaris, microstate data very useful
  • With Poisson arrivals, steady state, N identical servers, approximation of response time, R = S / ((1 - utilization)^N), S = service time, utilization = throughput * S
  • Issues with this simplistic model: bursted traffic, service time varies, N servers don't process the same thing,  virtual hardware make it a lot harder to figure out
  • Measurement errors (especially around measuring time)
  • So don't bother about utilization
  • Load average on linux is broken, it includes disk activity
  • I/O wait is fundamentally broken, the cpu never waits for I/O per se
  • Cockcroft Headroom Plots: 99th-%ile against response time
  • On linux, best way to track i/o per process is with SystemTap

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 09 Dec 2009 23:56:48 -0800 CMG'09: "How 'normal' is your IT Data?" http://www.scaleordie.com/cmg09-how-normal-is-your-it-data http://www.scaleordie.com/cmg09-how-normal-is-your-it-data Dr. Mazda Marvasti My notes on this very informative talk (the best I've seen today). The goal of the study was to evaluate the hypotheses around normal distribution assumption built in the newer IT monitoring tools, that create dynamic thresholds of the various metrics they collect.
  • Analyzed 4 workloads: ad-serving on LAMP, bond processing, stock trades and some online application
  • Test for normal distribution: Kolmogorov-Smirnov as it makes no assumption on the data distributions
  • Used average shifted histograms for the test
  • Results: none of the basic metrics (OS, applications, business-oriented) are normally distributed, neither are their averages, when looking at blocks of 1 hour
  • For instance Monday 9am does not look at all like Tuesday 9am
  • Also Mondays 9am don't on average converge, meaning that their average are not independent and/or the averages are not identically distributed
  • Business cycles matter very much in analysis, spectral analysis can help!
  • Correlations examined using Spearman's ranked correlation coefficient (though results not presented).
  • Conclusion: go for non-parametric analysis, known distributions don't really apply
  • If you enable dymanic thresholds based on normal distribution assumptions, expect a 10x in the number of alerts -- though it's possible to mitigate this with use of topology rules (e.g. "don't alert me if event 1 and event 2 coöccur)
My take on this: IT data analysis is challenging. One question is: how much is it worth, i.e. at what scale do you get your money back (and more) by getting this type of fairly sophisticated analysis and what kind of return can you expect of it? While the answer depends on the nature of the business conducted, I'm curious to see whether it's bigger shops with expensive applications, cloud-scale companies or whether this is going to percolate toward the smaller web shops, integral to an Infrastructure-as-a-Service offering? Stay tuned...

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 09 Dec 2009 21:32:38 -0800 CMG'09: "How do you analyze 100,000s of servers?" http://www.scaleordie.com/cmg09-how-do-you-analyze-100000s-of-servers http://www.scaleordie.com/cmg09-how-do-you-analyze-100000s-of-servers Charles Loboz (microsoft)
  • No homogeneous software/hardware/applications
  • Access is often limited (e.g. hotmail servers are off-limit)
  • In the old days, 1 server analyzed per day
  • Stopped using averages and stddev (because data are not normal)
  • Built 10-bin histograms for utilization
  • Even that is limited, because long tails are the ones triggering issues (e.g. bad queries triggering load, then all queries will pile up)
  • No one cares about utilization (except data geeks), only performance matters
  • Estimate utilization impact on performance with "Performance Impact Factor" (PIF): a weighted average of histograms, heavy utilization should be favored to make long tails more obvious, for CPU, for net, for IO
Recipe
  • Compute histograms
  • Compute PIFs for each server
  • Cross-tabulate PIFs to server names to tag servers as underused, overloaded, etc.
  • Store everything in a database
Pitfalls
  • PIF averages don't mean anything
  • It's good to tell a "dead-cold" server, but it's not good to tell you that you have an issue, just that you have to investigate

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 09 Dec 2009 17:35:26 -0800 At CMG'09 today http://www.scaleordie.com/at-cmg09-today http://www.scaleordie.com/at-cmg09-today On paper it looked like a scientific approach to performance management, born in the mainframe days when computers were expensive. Now it's cloud-scale that matters (and an ailing world economy if you're not a bank) so managing capacity rigorously (and in an automated fashion) makes sense. So far no breakthrough though, it's a bit too applied to my taste. Let's see what the next sessions hold in store.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Fri, 09 Oct 2009 16:12:33 -0700 Looking into system performance of an Oracle data warehouse http://www.scaleordie.com/looking-into-system-performance-of-an-oracle http://www.scaleordie.com/looking-into-system-performance-of-an-oracle

Introduction

This is the start of an ongoing investigation into system performance of an oracle 10.2 data warehouse being loaded . The database server has 2 real storage volumes (called dw-clear and dw-encrypt) and 1 virtual one (dw-encrypt-u) used to decrypt data on the fly. Most of the data and the i/o are on the dw-clear volume. System-data performance have been collected via sadc -d to capture per-device statistics. The data are then extracted using sadf -d filename -- -d -b -d. The summary is available here as a csv. It's a large table of block i/o stats, cpu stats and per-device i/o stats, suitable to be imported into R. The system characteristics are as follows.
  • Sun x4150 64GB RAM, 2x4 x5450, 1 4Gb/s QL2462 HBA with 2 ports.
  • 3 device-mapper devices, 2 using a round-robin multipath (v1, v2), 1 using an on-the-fly cipher to decode encrypted data (v3).
  • 3PAR S400 with 10k drives and 4Gb/s HBAs.
  • Out of the 64GB, 8GB are set aside as HugePages to serve as memory pages for the SGA.
The goal of this investigation is to understand what the bottleneck is in the processing and what can be done to remove it. Let's start with cpu utilization. [caption id="attachment_172" align="aligncenter" width="510" caption="Distribution of CPU time spent in userland when not idle"]
Media_httpscaleordief_vwdgl
[/caption] Not terribly loaded (I'm filtering out the long idle portions with user > 5. How about I/O? [caption id="attachment_177" align="aligncenter" width="510" caption="% of CPU spent waiting on IO"]
Media_httpscaleordief_fwbab
[/caption] Interesting, iowait is not negligible. Is it correlated to anything in particular? First of all, let's see how iowait varies with device utilization of v1.
Media_httpscaleordief_ihhfi
v1 is slowly but surely bringing iowait higher, to the point than more than one processor ends up waiting on I/O. To be continued...

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Thu, 03 Sep 2009 19:19:10 -0700 Blog battle on the storage appliance front http://www.scaleordie.com/blog-battle-on-the-storage-appliance-front http://www.scaleordie.com/blog-battle-on-the-storage-appliance-front Backblaze has started an interesting conversation by detailing how they get to $117,000 per PB, down to the type and number of SATA card used in their design. A great PR move for a company in the crowded personal backup space. Of course publishing comparisons with Dell, Sun, NetApp and EMC at 8x, 10x, 30x the price is a sure way to start stirring people's emotions. The first to publish a lengthy response (that StorageMojo could find) is Joerg Moellenkamp in a blog post. Laudable in pointing design flaws for fundamentally 2 different markets. Sure, Sun's hardware is a great piece of engineering, squarely aimed at the enterprise market. Which, incidentally, is not buying in droves and Sun's financials is clearly reflecting that. Backblaze took the google route for storage and it's hard to see, given the competitive pressure, how they would be better off spending their margin on Sun hardware. The era of gold-plated hardware is slowly drawing to a close and I can't say I oppose that change.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Tue, 04 Aug 2009 13:33:21 -0700 Netflix describes its culture http://www.scaleordie.com/netflix-describes-its-culture http://www.scaleordie.com/netflix-describes-its-culture [slideshare id=1798664&doc=culture9-090801103430-phpapp02]

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Mon, 29 Jun 2009 04:30:15 -0700 Catching up on Velocity 09 http://www.scaleordie.com/catching-up-on-velocity-09 http://www.scaleordie.com/catching-up-on-velocity-09 This year I could not attend Velocity so I decided to catch up via http://velocityconference.blip.tv. Here are a few notes on the sessions I have been able to see so far. John Allspaw (Ops) & Paul Hammond (Dev): 10+ Deploys per day: Dev/Ops coöperation at Flickr This is a topic dear to my heart: changing the culture shared (or not) by dev and ops.
  • Contrary to popular wisdom, ops' real mission is not to keep the service stable per se, but to enable the business.
  • Business requires change
  • Build the tools and the culture that allow repeated change with minimal uncertainty.
  • Automate your infrastructure
  • Use one shared source control, between devs and ops so that everyone on the team knows where to look
  • Reduce all manual steps down to one, that of deciding to build and deploy
  • Small frequent changes better than fewer large changes
  • Use "feature flags", i.e., use code to enable features, rather than branches
  • Ship TRUNK so that everyone knows what gets released
  • Feature flags allow for private betas, reduces uncertainty
  • Dark launches: enable the feature to exercise the data path but don't present the results to the end-user
  • Metrics, metrics, metrics
  • Add context to it, such as the last time something was deployed
  • We use IRC and IM bots to bring system updates into the conversation between dev and ops in real time, then push the logs into a search engine
  • Develop respect and trust between devs and ops
  • Have a healthy attitude toward failure (don't blame, fix the problem first)

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Mon, 29 Jun 2009 00:48:55 -0700 Started a friendfeed webops public group http://www.scaleordie.com/started-a-friendfeed-webops-public-group http://www.scaleordie.com/started-a-friendfeed-webops-public-group Feel free to join: http://friendfeed.com/web-ops

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Thu, 25 Jun 2009 18:01:58 -0700 #structure09 Hosting on commodity hardware http://www.scaleordie.com/structure09-hosting-on-commodity-hardware http://www.scaleordie.com/structure09-hosting-on-commodity-hardware I just got out of the panel on commodity hardware and did not get a chance to participate so here's my take on it. The panel started with an opening question: google, amazon and the likes run at a huge scale on commodity hardware, yet enterprise vendors still push customized hardware and expensive at that. To me the answer is pretty obvious: enterprise hardware is being for the most part sold to people who don't know how to architect and design software on a commoditized stack. Let's be honest, look at most "enterprise" hardware/software literature: it's just noise and a waste of both the writer's and the reader's time. And by stack I mean from the server, all the way up to the application code. If you constrain yourself to buy servers that cost no more than $5k, buying high-end database software makes little sense. Rather you recognize that low-end compute is how you get economies of scale and you apply the same reasoning to your networking gear, storage systems, database software, load balancing software, etc. Google, from its earlier papers, seems to be the first to have understood that, rejecting the usual marketing garbage from large vendors. And for that we should be grateful.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Wed, 17 Jun 2009 03:34:05 -0700 I love Amazon Web Services open pricing http://www.scaleordie.com/i-love-amazon-web-services-open-pricing http://www.scaleordie.com/i-love-amazon-web-services-open-pricing I've just spent 2 hours crafting a spreadsheet to compare how much it would cost to set up a decent platform to deliver the kind of data services I manage, vs. the same on EC2. Easy access to pricing is a key variable that's often hard to get from vendors without being subjected to the "custom solution" time-waste. Technology vendors, your customers, more often than not, know what they want. When I ask for a price list, don't try to second-guess whether I've done my homework, just give me the price list. If I have questions regarding the "solution" I'll be more than happy to ask.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Tue, 16 Jun 2009 00:28:38 -0700 How about sub-second queries in Hadoop? http://www.scaleordie.com/how-about-sub-second-queries-in-hadoop http://www.scaleordie.com/how-about-sub-second-queries-in-hadoop Two observations from talking and listening to people during the Hadoop summit; firstly hadoop is used quite often to process clickstream data -- in all fairness I missed the talk about hadoop used for genomics. Secondly and a corollary of the first, sub-second queries in hive or pig are not quite there yet. Since a hive query translate into maps and reductions their scheduling determines in addition to the sheer volume of data is going to determine response time. Undoubtedly pre-computing aggregates is a natural way to go much like what is done for data warehouses. Where these aggregated should be stored for consumption is a problem that could to hybrid solutions. Process data with hadoop and export then to postgres or infobright to enjoy a more mature (but less scalable) run-time environment. Get multi-terabyte daily processing and sub-second analytics and all that open source. If you've done something like that, I'd be interested to know before I embark on a route where others have failed before.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc
Sat, 13 Jun 2009 04:34:12 -0700 Notes from the 2009 Hadoop Summit West http://www.scaleordie.com/notes-from-the-2009-hadoop-summit-west http://www.scaleordie.com/notes-from-the-2009-hadoop-summit-west I just got back from Santa Clara where Yahoo and Cloudera were hosting the 2009 Hadoop Summit West on Wednesday followed by a training on Thursday. My interest was one of a prospective user -- to gauge how real and mature hadoop is. The turn-out was more than decent, in the hundreds; a number from Yahoo, running the largest clusters so far, a few folks from Amazon, Facebook, some local universities and a fair number of small companies that have deployed their own clusters (or are running on EC2). The good news first, hadoop is real and it's getting real use. It's clearly a promising platform with active use and development. The scaling model is fairly simple: buy more machines. The current sweet spot is dual-quad hosts with 4x1TB drives and 16GB or so of ECC RAM. Decoupling storage from a central system (à la SAN) is the way to go. Some folks have tried to hook up Thumpers to Niagara chips that run a lot of threads in parallel with some success but the TCO question is unclear. Hence we can start with a handful of cheap machines and go from there. A few things to watch for: the secondary name node for instance, is there here for backup but to persist the DFS layout structures that exist in RAM on the primary name node. It could have been implemented in a more robust fashion using a sql database rather than requiring a re-implementation of redo logs and data files. That's overall the negative point: applications built on the platform (such as hive, hbase and pig) are still pretty much works in progress, somewhat duplication functionality. There is an air of Not Invented Here that still pervades but it's a sign that the whole thing is still young. A vocal user base that meets regularly should help the project focus on the pieces that truly do not exist yet.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/1174368/5628480905_c677dc1616_o.jpg http://posterous.com/users/1lWmpHaVQVX Alexis Lê-Quôc alq Alexis Lê-Quôc