In lieu of the amazing technological and ICT advancement seen in the last 10 years, the world has seen the evolution of the data centre to the pinnacle of connectivity and a core service provider to clients as valuable as electricity itself. This new dependency on data has created an emphasised need for robust and continuous functioning, creating the requirement for complete, transparent and continuous monitoring of the data centre.
With the recent and unfortunate razing of one of the largest data centres in the world in Europe, an opportunity for introspection of data centre stakeholders of all scales of operation arises – to consider whether the complete heartbeat telemetry of their data centre is surveilled, and, more importantly, that the right people are analysing the data. The team monitoring the data centre would never have been apathetic to an impending catastrophic failure beforehand, but would have been confident their monitoring was sufficiently thorough to prevent failure, right up until the data centre burned down.
The situation is more pertinent in Africa, the most aggressively growing data-dependent continent, with the unique challenges of far-flung core service provision and a COVID pandemic preventing international travel (and subsequently international OEM assistance). This further embeds the requirement for a competent data centre partner versed in all individual infrastructure components and their subsequent integrated systems to bring value to the data reaped from the monitoring system.
The cause of the incinerated data centre may be the same origin as that which experts have seen time and time again – failing batteries. Monitoring and escalation are only as strong as the weakest link in the entire data centre ecosystem – it is almost senseless and promotes an incredible false sense of security to monitor critical UPSes without monitoring the very batteries that supply them. If the UPS underwent a failure, redundant units would generally prevent a drop in services; in the very worst case of a cascaded failure, services would go offline and resume when the fault was rectified. When batteries burn from age or commonplace manufacturing defects, they risk taking your entire investment, reputation and future income in the inferno they create.
These oversights are present in all aspects of data centre telemetry – unmonitored leak detection events for diesel, water in diesel (a common diesel-delivery theft tactic), or diesel theft from source. Further issues arise from cooling system monitoring – is it valuable to know the air conditioner failed the moment it fails, or to see an upward trend in power consumption in the unit, insinuating failure prior?
These notions also bring the fallibility and auditability of the data itself into question. Is the responsible party for monitoring the system a person, one who can resign or become disgruntled; or is the system that data centre owners rely on operationally, independent of whom operates it? Dependency on people, innately unreliable and subjective in nature, can result in huge inequities, where those in the technical know have power through jargon and obfuscation over their financers. Is it sensible to have the technical representative responsible for managing the site the only one who can produce a report as to why their own site failed?
Accountants cannot audit themselves; technical teams with accountability should be limited with the same impunity.
Those data centre stakeholders who modernised their solutions beyond ancient SMS-based alerts to encompass all aspects of their investment, who knew the voltage of each battery, the fill level of each diesel tank, the status of each generator and its battery, whether there were fault conditions on any critical devices, and ensured they were partnered with an objective team to audit their data centre’s vital signs, independent of people – dependent only on objective systems – would most likely have found themselves to have been far more comfortable, confident and relaxed in these arduous and socially-distanced times: you can only believe what you see; you can only guarantee that which you universally control.
Share