The exabyte data flood is coming

By Mervyn Mooi, Director of Knowledge Integration Dynamics (KID) and represents the ICT services arm of the Thesele Group.

Johannesburg, 29 Apr 2008

Try and imagine an exabyte. It's one level above petabytes, two above terabytes and three above gigabytes. It's the equivalent of 50 000 years of DVD-quality video. If every utterance ever made by mankind were reduced to text, five exabytes would be needed to contain them.

The sheer amount of data being transmitted across the Internet is growing at a terrific rate, and threatens to overwhelm the Internet - this flood of data is being termed the exaflood. The Internet's carrying capacity is finite, and in just a few years, it will run out of steam.

An exabyte is by today's standards an inconceivable amount of data: to build an exabyte-sized storage device would cost more than $200 million. Most current computers cannot cope with that much data, although in theory 64-bit computers can allocate 16 exabytes of RAM to a single program.

The Square Kilometre Array radio telescope facility, which will peer back to the beginning of time and which will be constructed by 2011, will generate an exabyte of data every four days.

An exabyte is 10 to the power of 18, or 1 000 to the power of six. It is a million terabytes, as a terabyte is a million gigabytes. It represents one quintillion bytes.

All of this is to show that data volumes are growing at record volumes, and this trend is not going to slow. In its study, "The Diverse and Exploding Digital Universe", IDC predicts that by 2011, the total volume of electronic data created and stored will grow to 1 000% of the 180 exabytes that existed in 2006. This equates to compound annual growth rate of almost 60%.

Data types are becoming more diverse: music, photographs, video, voice over IP, fax over IP, RFID and software applications are seven examples of data types in daily use. The number of electronic information containers (such as files, images, packets and meta-tags, to mention a few) is growing half as fast again as the total number of gigabytes stored. IDC estimates the information created in 2011 will be held in more than 20 quadrillion containers.

These are breathtaking statistics, and they point to a true data flood, a deluge the likes of which we have never seen. They tell us that we need a new approach to data, as existing technology and approaches will not cut it.

These are breathtaking statistics, and they point to a true data flood, a deluge the likes of which we have never seen.
Mervyn Mooi is director at Knowledge Integration Dynamics.

For instance, the bad news is that of the data being generated and consumed, only 5% comes from the data centre, and 35% elsewhere in the enterprise.

The other 60% is being generated outside the enterprise. Yet at some time business and IT will have responsibility for the vast amounts of data being created, most of it decentralised today.

Getting this situation under control is a critical business imperative, best managed proactively. There are three approaches:

* IT must drive a new relationship with business, setting a structure where business and IT both accept and embrace responsibility for data. Roles, responsibilities, budgets and all the actions associated with a business need to be developed, rolled out and bought into.
* Set new policies and standards for data and information, including data acquisition, retention and disposal; clear division between structured and unstructured data, with a common approach for managing the two types; document management across the lifecycle, at the right price point; enterprise search, virtualisation, security, management and more. The task is vast, and it needs dealing with before it gets totally out of hand.
* Acquire the appropriate tools to give effect to the new policies and standards.

Data has never been an easy discipline to manage. It needs constant vigilance and care if it is not to slip out of control as frequently and constantly as it does. It needs a new approach before executives can know they have a chance of managing the new environment.

And by the way: for anyone who thinks we won't ever get past exabytes, cast your mind back to the days when a megabyte was a goldmine of storage, and no one could imagine we would ever reach a gigabyte. Today a gigabyte is a mundane commodity.

The next two data landmarks are zettabytes and yottabytes. A zettabyte is a million exabytes, and a million zettabytes is a yottabyte.

Time is moving on, data volumes are rocketing ahead ... time to get serious about data.

* Mervyn Mooi is director at Knowledge Integration Dynamics.

The exabyte data flood is coming

Get ready for the exaflood as the sheer amount of data threatens to overwhelm the Internet.