7 September 2013
by Powertechnic

Failure Rate Data for Mission Critical Facilities

A data centre outage can be a stressful experience for all concerned particularly if you’re the mission commander. It isn’t surprising that patrons of critical facilities are very coy about the circumstances surrounding an outage.  It is plausible that this sentiment has led to an absence of truly representative failure rate data throughout the industry.   I exchanged some correspondence with Ken Brill, founder of the Uptime Institute when he attended a conference in Australia a few years ago and he told me that a lack of failure data within the industry was a significant problem.

What is good system or module-level failure rate data and how do you get it?  Data  that has been taken from the field is always the most valued. Such field data  should be accurately recorded by an independent professional or body.

So how do you get data of this standard that relates to your specific application?  The simple answer is its generally not available.  When people ask for accredited data (e.g. IEEE Gold Book), they mean a professional body who are widely considered credible.  This however says nothing about how the data applies to your particular industry.  Although, on the plus side if you use that data in an A-B comparison, you are using the same basis as someone else doing the equivalent comparison.  When software packages quote accredited sources of data such as Bellcore/Telcordia (SR-332), FIDES, MIL-HDBK-271F, NSWC-07, these are aimed at component-level problems where you would like to build an estimate of product reliability at the design stage.  They are of little use in system-level studies.

What are the chances that someone will have a representative set of data from a deployment of the same kind of UPS, battery, switchgear, chiller, cable, bus-way, etc that you’re planning to use? Even if they did, there is also the question of how they operated and maintained their facility. We are not talking of an MTBF figure for a single component such as a resistor but systems with hundreds of thousands of components interconnected at a system level in complex ways. This is a dilemma with a surprisingly simple solution.

Powertechnic, through association with Telcos that have a very large base of module-level field failure data, have been able to test models against actual field performance.  This data has been used extensively to develop performance standards for data centre and telecommunications power systems. Comparisons of the modelling with actual field outages over a long period of time has shown very good representative correlation.

Module failure rate data incorporated within Analyst Enterprise software comes from a variety of sources.  Much  came from field failure information from operating a large base of Telco UPS (approx. 300) and d.c. (> 40,000) systems and has been authorized for commercial use.

