In the previous Industry Insight in this series, I looked at data replication and remote journaling. In this month's Industry Insight, I'll look in detail at system monitoring and role swapping.
Once businesses have established data replication processes between systems, they need to be able to monitor this process continuously. Thousands, even millions of transactions may need to be replicated each day, especially in high-volume transaction systems. Any disruption to system-to-system communications, the journaling components or the journal entry apply processes can cause significant problems. One or more objects can lose synchronisation, which jeopardises their data integrity on the backup system.
This means a monitoring process is needed to guarantee replication integrity. Without this, the backup system could be compromised, and companies will only find out about it at exactly the wrong time: when they need it.
Here the word autonomic comes into play. Autonomic within the context of this environment means the ability to self-monitor and self-heal. Autonomic functions are crucial to reliability and ease of use in a high-availability solution.
As an example, a high-availability monitor should inherently determine if an object on the backup system is out of synch with the same object on the production system. If it determines this to be the case the monitor self-initiates the process of re-synching the object.
It recopies the object from the production machine to the backup system and applies necessary journal entries to render it current, and does all this without interfering with the ongoing replication of other objects.
Importantly, this process should not be burdensome on IT management, and while it is vital it should not occupy more than a few minutes of management's time each day. If the monitoring system has been set up correctly, this should be the case, as it should automatically self-correct most problems on the fly.
Changing roles
Moving users from the production to the backup system is known as a role-swap as the backup system in essence assumes the role of the production system while actual production system is in maintenance or under repair. The process is also known as a roll-over or switch-over. In the event of a system failure during role-swap, it tends to be called a fail-over.
Best practice mandates that once the components of data replication and system monitoring are deployed, the role-swap process should frequently be tested to ensure smooth execution of the process and integrity of data on the backup system.
The role-swap process should include the following steps:
* Monitor to ensure synchronisation of all objects between the two systems.
* End all user and application jobs on the production system.
* End replication and monitoring jobs on the production system.
* Designate the backup environment as the production environment.
* Start the replication and monitoring jobs on the backup system.
* Start user and application jobs on the backup system.
Once the role-swap is done, and businesses are ready to return to the production system, they need to reverse the process, a phase referred to as a roll-back.
Automatic advantages
The first time a role-swap is done there may be extra time required to remove unexpected problems with communications, system addressing and the ending and restarting of user jobs, applications and high-availability components. This is to be expected as each system's requirements are unique, just as each system is unique.
One of the lessons of disaster recovery is that system outages occur in inverse proportion to one's failure to test.
Raul Garbini is a director of Edgetec.
A good high-availability system can keep a number of object types replicated in near real-time, rather than just data. It provides the ability to replicate user profiles, device configurations, spool files and other vital objects. A successful role-swap mandates that all components must not just exist on the backup system, but be current too.
The role-swap component of a high-availability solution should have enough automation that during a controlled role-swap or fail-over most components needed for the backup system to take over the role of the production system are automatically activated.
If all has gone to plan, and the system has been fine-tuned, it won't be long before users are presented with a sign-on screen. This shows just how important it is to have autonomic functions in the role-swap process.
Finally, I need to emphasise the importance of doing planned role-swaps. This allows companies to test their disaster recovery and high-availability plans regularly and better prepares them to deal with inevitable system failure and disaster recovery. This is non-negotiable: one of the lessons of disaster recovery is that system outages occur in inverse proportion to one's failure to test.
In the next Industry Insight, I will look at how to evaluate high-availability solution vendors.
* Raul Garbini is a director of Edgetec.
Share