Testing for disaster

By Petrus Human, Chief technical officer at Attix5.

Johannesburg, 21 Oct 2014

In my last Industry Insight, I discussed the importance of having business continuity (BC) and disaster recovery (DR) plans in place. But, once the decision has been made to take this route, how does the company go about planning and testing for a disaster?

As with many things in business, there is no golden formula that all companies can adhere to and implement universally. When it comes to DR planning, the first step is making the decision to do it. Too often, decision-makers are stuck discussing what to do, without getting around to actually taking the next step.

Secondly, the company needs to define what the important systems are, and cover those from a DR perspective as soon as possible.

Realistically, systems such as those used by development and testing teams are not needed to be permanently online. There will be no significant cost of lost revenue if these go dark for a few hours. Decision-makers must look at both the immediate as well as the long-term impact of the systems that need to be included in the DR plan. It is vital to gain a clear understanding of what is required from a backup perspective. In other words, if there are 10 systems in place, the executive team needs to evaluate the business impact of each of them and how critical they are to operations.

Great expectations

This entails examining how long the company can realistically be offline, translating as much to the cost aspect as one of managing stakeholder expectations. It involves looking at the critical functions necessary to ensure the company suffers the least financial impact possible. For example, Visa will lose millions of dollars for every second it is offline, whereas an accountant can afford to lose a few hours' work due to downtime. Unfortunately, many companies adopt an all-or-nothing approach that becomes costly to implement and manage.

An effective DR plan is dependent on the availability requirements of the business as well as its size. A small business with only one service running all its operations will have vastly different needs from a large one with branches throughout the country. A mistake many companies make is leaving this evaluation completely up to the IT department. The IT department does not necessarily have a critical understanding of what is required to keep the business running. Business and IT need to work together to clarify and understand the importance of each system and where the priorities should be.

Smaller businesses are also likely to use external service providers for their DR planning, whereas bigger companies have the benefit of having certain people in place to have a good idea of the priority areas needed for planning. Irrespective, companies might have some experience in DR and backup, but are not necessarily specialists. I have seen numerous examples of staff being used to do DR and backups in a certain way and never move outside their comfort zones. This might change the moment a new IT manager comes in, replacing the entire strategy with one s/he is more experienced in using, even though this may not be the best fit for purpose in the particular company.

Uncomplicated

From a testing perspective, things need to be kept fairly simple. For example, if a company runs daily backups, what is being done to verify the integrity of those backups? Far too often, people are not aware if the backups have even run properly. There are solutions available that provide quick testing in backup environments, but a more comprehensive test is still required at least every quarter.

When it comes to DR planning, the first step is making the decision to do it.

This could entail running a full backup to the cloud and seeing whether users can keep working if access to certain files and processes are cut off. When it comes to full DR testing, this needs to be done at least twice a year. Companies should also take knowledge from these tests and see where they can improve not only internal processes, but also the DR strategy.

An essential aspect of all this is to make sure the security of the system is monitored on a continual basis. Recent examples of malicious users hacking into cloud-based systems are making decision-makers more aware of the need for stricter security measures. An extensive security audit needs to take place once a year, even though the process must be an ongoing one.

Beyond this, decision-makers need to realise the first draft of a DR plan will not be perfect. They can never think of everything and plan for any scenario. A DR plan is best viewed as an organic document that needs to be continually tested, evaluated, adopted and implemented to make sure the best system is in place for the company's current needs.

Testing for disaster

Decision-makers must explore the immediate and long-term impact of which systems to include in a disaster recovery plan.

Great expectations

Uncomplicated