Cleaning up dirty data

Johannesburg, 22 Jun 2006

About five years ago, when I asked a group of executives about the quality of the data in their organisations, the overall response was one of polite disinterest. This is ironic in the sense that we had just emerged from one of the most visible data quality problems in the history of computing - Y2K - but few made the connection.

Mention "data quality" today and lively debate ensues, fuelled by the recent realisation that the success of initiatives involving compliance, business intelligence, master data management and single view of customer, all hinges on a critical common foundation: our data, and more importantly, its quality.

The same mistakes have been repeated so often that business has finally been forced to identify a common factor - and it`s been there all along, but we missed it! It`s now showing up in corporate databases, spreadsheets and documents scattered across the planet - the poor quality of the data representing everything we do, make and sell, and every person and company that we engage with.

Wider domain

One good example of this oversight became evident after the rush to install customer relationship management (CRM) systems, with scant regard for the accuracy of the customer data underpinning them.

Many CRM packages, crammed with promising features intended to nurture our relationships with our customers, did exactly the opposite, frustrating clients who contacted new call centres only to be offended by inaccuracies hiding in the data.

Unsurprisingly, data quality has always primarily been associated with customer contact details, but it is rapidly becoming the domain of all classes of data, including product, supplier, asset and financial information, as organisations seek to streamline the information value chains that for so long have been neglected.

(Dis)integration

For too long we have treated data as a by-product of the business processes that create it, and in so doing have built not databases, but datadumps. This has been compounded by the ease with which we have spawned new systems with copies of the same data across our empires.

For too long we have treated data as a by-product of the business processes that create it, and in so doing have built not databases, but datadumps.
Bryn Davies, regional manager for Sybase SA`s Cape Town office.

The fact is that these multiple systems all have to interact as they play their role in the transaction, and an entire industry has grown up to support this need, with enterprise application integration being near the top of the CIO priority list for many years.

We might have eventually got the integration part right, only to realise that we had just moved data non-quality up another notch, by introducing non-aligned data across these multiple systems. Executives have reached a new level of frustration from the now common (and costly) requirement to validate (or invalidate!) reports because of discrepancies between source systems.

Call for consolidation

While some were building multiple systems and battling with integration issues, others were hoping that by centralising everything in a single ERP package, their integration headaches would disappear.

They probably did, but then the data quality headaches began - in the rush to get the ERP system up and running, some left the migration of data from legacy systems into the ERP databases until late in the project ("that`s the easy part," they said), only to discover too late that levels of data quality were in a hopelessly inadequate state to support the new package, and especially any new or improved business processes anticipated by its introduction.

Data is forever

Computing has evolved largely as a means of automating business processes, and as a result the industry has become fixated on the "application", while the "data" has received secondary status.

This oversight, compounded by our almost uncontrollable rate of data collection, has resulted in a growing desire to manage it as it justifiably should be. Its quality is but one aspect, but it is an aspect that is critical to overall business success. Quality, in manufacturing particularly, has been around for decades, and it has evolved into a science with well-proven techniques and outcomes.

Fundamentally, the resultant quality of a physical object is determined largely by the processes that go into its creation. So fixing inefficiencies in these processes will positively affect its quality. And so too with data - data is created by business processes, so a close look at the quality of our data must lead to a look at how it got that way in the first place - in other words what business processes created or changed it, and why.

Implemented correctly, any attempt to sustain improved data quality will therefore ultimately improve the business as a whole!

At last we are seeing the light: in the information age, it is data that is the lifeblood of our organisations, and just like dirty petrol causes poor engine efficiency, dirty data contributes to poor company performance, and in the worst of cases, complete failure. Let`s make sure this does not happen to our companies; it`s not too late - yet.

Cleaning up dirty data

Dirty data contributes to poor company performance, and in the worst of cases, complete failure.

Wider domain

(Dis)integration

Call for consolidation

Data is forever