Data de-duplication, the method of reducing data storage needs by eliminating redundant data from a device, is gaining attention because of its ability to reduce the size of the backup storage repository by an average of twenty-to-one.
The ability to store so much more data in less space is also vindicating the growing acceptance of disk-based backup systems, enabling longer disk-based retention of backed up data (instead of its transfer to tape), thereby improving the performance of restores - as disk is the faster medium.
Market research analysts confirm the technology of backup de-duplication is no longer in its infancy, having progressed well into the 'hype cycle'.
The idea of the hype cycle was conceived in 1995 by a prominent analyst/research house in the US, as a commentary on the common pattern of human response ('hype') to technology. Hype cycles are a graphical way to track multiple technologies within an IT domain or technology portfolio.
A hype cycle characterises the response to the emergence of a technology from an initial 'over-enthusiasm' through a period of 'disillusionment' to an eventual understanding of the technology's relevance and role in a marketplace - the 'plateau of high adoption'. Distinct indicators of market, investment and adoption activities are associated with each phase.
Out of the gutter
Many industry watchers maintain that de-duplication technology has passed through the 'trough of disillusionment' and has entered the final phase.
Like runners halfway through the opening phase of a long distance race, the players in the de-duplication arena have settled down after an initial spurt and are taking stock of who is ahead of whom. The stakes are high and a 'winner-takes-all' mentality seems to pervade. The model for victory seems a simple one: he with the most features and the best performance wins.
It should be remembered that backup de-duplication was really a spin-off from the D2D (disk-to-disk) backup approach (also called B2D - backup-to-disk).
D2D backup boasted several advantages over tape-based backup, mainly related to overcoming the natural shortcomings of streaming tape media. D2D benefits include the ability to sustain multiple streaming sessions to the same media, and the ability to locate the position of a record very quickly for a restore.
The problem with D2D was the fact that although disk was becoming cheaper, it was still more expensive than tape cartridges, so the idea of keeping a full weekly, monthly, yearly backup regime stored on disk was untenable. With the arrival of de-duplication, this became both possible and affordable. Thus de-duplication was initially an enabler for the large-scale uptake of D2D.
The de-duplication techniques applied in this way are referred to as 'target-based' and most offerings available today are of this type. These include in-line de-duplication - in which the data is de-duped as it is ingested - and 'post-processing' in which the de-duplication is applied retrospectively.
Chunky idea
Backup de-duplication is no longer in its infancy, having progressed well into the 'hype cycle'.
John Hope-Bailie is technical director of Demand Data.
Another school of thought applies de-duplication at the source. As a 'chunk' of data is retrieved from its resting place, it is 'check-summed' and matched with a central set of checksums if possible. (A checksum is a value that is computed to detect if the data has been transmitted successfully.)
Chunks that match are not re-sent, and only new unique chunks are transferred to the central pool. Solutions of this type are well suited to low bandwidth configurations such as branch office sites. However, there are few players in this space.
An increasingly prevalent association with de-duplication today is 'replication'. This refers to data from one de-duplication device being transferred to another over an Ethernet link. It is becoming a critical requirement for backup de-duplication systems.
Replication is enabling data centre managers to move beyond the simple movement of data from 'A' to 'B', giving them enhanced capabilities to meet the ever-increasing requirements of sophisticated disaster recovery solutions.
For example, corporate multi-site protection with cascaded replication allows for multiple disaster-recovery options. A selected Johannesburg off-site office, for instance, could be the designated disaster recovery location for a company's nationwide branch network. This site could replicate to a Cape Town site for protection from a broader disaster.
Significantly, replication has the potential to eliminate the use of physical tape as a storage medium altogether. The last remaining use for tape was for offsite storage of backup data - keeping a copy of the data far away enough from the source to escape any disaster that might befall the source site.
Efficient and robust replication between de-duplication appliances now makes it possible to dispense with tapes by simply positioning the second device in a secure location.
* John Hope-Bailie is technical director of Demand Data.
Share