
MWeb services suffer five-hour fail

A problem at the Internet service provider's Johannesburg data centre caused hosted services to be out of action until 9pm last night.

Bonnie Tubbs
By Bonnie Tubbs, ITWeb telecoms editor.
Johannesburg, 15 Aug 2013
MWeb admits it took too long to find the source of a five-hour failure yesterday.
MWeb admits it took too long to find the source of a five-hour failure yesterday.

In what appears to have been a configuration glitch at MWeb's Johannesburg data centre, thousands of customers were without certain services for a period of five hours yesterday.

According to the Internet service provider (ISP), the outage - between 4pm and 9pm - affected server hosting, virtual hosting and parts of MWeb's mail platform. ADSL services were not affected.

ITWeb was one of the many companies affected by the outage.

MWeb CEO Derek Hershaw says, while engineers are still in the process of conducting a post-mortem into the protracted episode, initial findings point to it having been due to a configuration problem between the main access switches in the data centre.

"It wasn't equipment/hardware failure. We have resilience and redundancy at an infrastructure layer to safeguard against that, and even in extreme cases where you have multiple instances of hardware failure, it's usually a 'quick fix' in terms of isolating the affected equipment and replacing it."

Rather, he says, the snag appears to have originated at switch level. "We have two switches (for redundancy purposes) and both were trying to assume the role of the 'primary gatekeeper'."

Hershaw says, once it had been established it was not a hardware failure, the only way to resolve the problem was through a process of elimination, "and that, unfortunately, took far too long".

The magnitude of the problem meant MWeb customers were largely left in the lurch, as attempts to contact the ISP's call centre proved fruitless. The company sent out a series of SMS updates in an attempt to keep customers in the loop.

"Because so many services were affected, our call centre was simply swamped."

Hershaw says the issue is being given the company's full and urgent attention. "We're now making sure that our configuration at an architectural layer doesn't allow for a repeat of it."

* ITWeb received the featured details in a response from MWeb and no further information was available at the time of publication. GM of MWeb Business, Andre Joubert, said this morning a full incident management report would be available by the end of the day.
