Data quality fundamentals

Johannesburg, 13 Jul 2006

The fact that data is not up to scratch is merely the symptom - the cause will be found in defects in the processes and activities that create the data in the first place.

It`s natural to want to tackle dirty data by simply "cleansing" it, in itself a daunting task for some organisations. However, in order to sustain high levels of data quality, it is also necessary to fix the problems with its creation and modification on an ongoing basis.

This generally not only involves additional validation checks in computing applications, but must also include a concerted effort to examine and address inefficiencies in the human, transactional and workflow elements involved in collecting data - in other words the root causes of the bad data that winds up in databases.

Remember too that poor data quality also stems from poorly designed databases, and so the structure, not just the content, of databases must also be closely examined and improved, by properly aligning the data model to the business requirements.

Catalyst

Granted, this significantly broadens the scope of any data quality initiative, and takes it to perhaps unanticipated levels in the company; but it is this very subtlety of a data quality programme that can serve as a catalyst to bring business and IT closer in the organisation`s quest to boost overall business efficiencies, cut costs and improve profits.

Indeed, a data quality initiative can help to promote issues such as information ownership and accountability, drive efforts to identify and document agreed business rules and metadata, and ultimately form the backbone of corporate data governance or master data management programmes. Business rules are data quality rules, making data quality neither an IT issue nor a business issue - it is everyone`s issue.

Measured approach

There are a number of core activities associated with data quality, but none as important as measurement. Before embarking on an exercise to improve things, it is critical to first find out just how bad (or good!) the data actually is. This usually takes the form of a data profiling exercise where the deliverables are a view of data quality levels expressed in relative terms, as well as a scope of the effort required. Inevitably, in order to get ongoing buy-in from senior levels, data quality needs to be expressed in business impact money terms, and an exercise to prove the potential cost savings and increased revenue potential needs to be performed.

Data quality issues cannot be addressed by simply setting up a data-cleansing project.
Bryn Davies, regional manager for Sybase SA`s Cape Town office

This involves a clear understanding of the information value chains within the organisation, knowledge of which will also help improve respect among the producers and suppliers of data for the needs of all downstream consumers of that data, be they systems or people. After all, the ultimate judge of the quality of anything is the customer, and in the case of data quality, data is the product and company employees (among others) are the customers - they are the ones who require good data to perform their roles effectively and timeously, in support of the organisation`s business objectives. Regular measurement also allows data quality levels to be expressed in an easily digestible form, adding substance and visibility to another crucial component of a data quality initiative: company education sessions and improvement programmes, key to raising awareness throughout the organisation of the negative effects and behaviour patterns associated with low quality data.

Code it

Today`s organisations comprise between hundreds and thousands of discrete databases, with billions of records scattered across multiple systems. It therefore makes sense to prioritise a data quality management programme, starting with a subset of core data in applications where the highest impact of defective data manifests.

Measurement as described above therefore needs to take place on statistically relevant samples of data, aligned with the role that that particular data plays for all downstream stakeholders.

Even with sampling, however, manual methods (spreadsheets, SQL, scripting, etc) of dealing with data quality issues quickly become unwieldy with today`s data volumes, and so the use of data quality tools for this and ultimately for the tasks of standardising, matching, de-duplicating and consolidating all classes of data becomes mandatory.

Strategic imperative

Data quality issues cannot be addressed by simply setting up a data-cleansing project - this merely automates ongoing remedial maintenance.

While cleansing is a necessary part of it, a true, effective and sustained data quality initiative will involve a combination of tools, methods and processes spanning both business and IT, topped off with a healthy dose of ongoing strategic commitment.

In the end, not only does data have to be cleaned - it has to be kept clean too.

Data quality fundamentals

Data quality is about clean data - right? Wrong! That`s just a part of it, and, as data quality creeps to the centre of the radar screen, here are some important issues to consider.

Catalyst

Measured approach

Code it

Strategic imperative