CMG'09: "How 'normal' is your IT Data?"
Dr. Mazda Marvasti
My notes on this very informative talk (the best I've seen today). The goal of the study was to evaluate the hypotheses around normal distribution assumption built in the newer IT monitoring tools, that create dynamic thresholds of the various metrics they collect.
- Analyzed 4 workloads: ad-serving on LAMP, bond processing, stock trades and some online application
- Test for normal distribution: Kolmogorov-Smirnov as it makes no assumption on the data distributions
- Used average shifted histograms for the test
- Results: none of the basic metrics (OS, applications, business-oriented) are normally distributed, neither are their averages, when looking at blocks of 1 hour
- For instance Monday 9am does not look at all like Tuesday 9am
- Also Mondays 9am don't on average converge, meaning that their average are not independent and/or the averages are not identically distributed
- Business cycles matter very much in analysis, spectral analysis can help!
- Correlations examined using Spearman's ranked correlation coefficient (though results not presented).
- Conclusion: go for non-parametric analysis, known distributions don't really apply
- If you enable dymanic thresholds based on normal distribution assumptions, expect a 10x in the number of alerts -- though it's possible to mitigate this with use of topology rules (e.g. "don't alert me if event 1 and event 2 coöccur)