Subscribe
About

The reason behind BI mistrust and how to defeat it

By Francois Cross, director of IT Business.


Johannesburg, 20 Sep 2012

A lot of South African businesses still don't trust business intelligence (BI) solutions to give them credible information on which to base their strategic decisions. The standard joke is that the CEO gets his executives together in the boardroom where they ply through the spreadsheets and graphical reports that the minions spent countless hours creating before tossing them all aside and going with their gut instincts, says Francois Cross, director of IT Business.

The joke exists and is amusing to those in the industry, specifically because it is based on some element of truth. Many BI solutions really are unreliable, and in many cases, that's because the data they have to work with is itself unreliable.

In the old days, businesses would collect their data, filter it into a database, and then work with that to create a view of what was going on in the business. But the typical number of data stores has grown and there is a great deal more interaction with other businesses now too. That wouldn't be a problem if everyone used the same type of database, or if all databases used the same standards, and also if the companies supplying the data ensured that it was all spick and span. Anyone who has spent even a small amount of time in the industry knows that good-quality data is a rarity, and that although standards exist, they are seldom adhered to, and architectures differ in the exceptional instances where they're successfully applied.

The incentive to supply good-quality data is also often one-sided. I've recently worked with my team to supply a customer's BI solution with data from 12 third-party providers who run national operations in the telecommunications industry. My customer uses its networks to operate a distributed service for a portion of its subscriber base, which is around 90 000-strong. The financial incentive for the service providers is slim, but for my customer, who needs to understand the subscribers' habits to supply a more robust and focused service, the financial implications are far greater.

At the moment, my customer gets about 60% of its data from the third-party suppliers, typically in flat files like .csv and .txt types, uploaded to an FTP site. The rest of the data originates in my customer's database. And that presents us with a problem. The 60% of data that we get from the 12 different sources arrives in multiple formats and the quality is erratic. It's what we call a lack of referential integrity; it's one of the primary reasons why business people don't trust BI solutions, and it's what we mean when we say the data is bad; it's unreliable and ultimately leads to a lack of trust.

So how do we deal with that? The best solution is to get all of the service providers to provide the right data, in the right format, and ensure that it is always correct before they give it to us. And while there is a business project underway to achieve just that, it does take some time to arrange. Even so, if an agreement is reached, it is unlikely that the quality of the data will be consistently good. In the meantime, we need to provide some business value for our customer.

What we do is have scripts that automatically check the FTP sites for the daily uploaded files. Then we automatically pull those files into our customer's system where we subject it to an extraction, transformation and loading (ETL) process.

The challenge during extraction, which really sets the tone for the remainder of the project, is that data stores use different formats for organising the data. In this particular case, we are better off than we could be because all of the files we receive are flat files, so one format although, within that format, there are discrepancies. The transformation component is what we're building when we develop the rules that are applied to the data we receive and how we prepare that for operational use by our customer's decision-makers. Basically, that means we interrogate the files we get and check to see if they have missing rows, columns and if some of the data has been left out. We then start creating rules for the software to check those files in the future. Over the period of a month, we'll be able to build a set of rules robust enough to give us usable data. Those rules help us develop the DQ aspect of the project. We also merge the datasets, deduplicate data, aggregate the data, and so on. Once we've run the rules, we'll load the data into the warehouse.

The way we run this particular operation is to have some technically skilled consultants operating the systems, which control the largely automated processes. The systems do not require very powerful hardware so they're quite cost-effective. In this instance, we have 600GB of data. The key to getting the most out of the data and maximising the return on investment (ROI) is in the software tools and the skills to run them. They really make the difference between being able to work efficiently with the data or requiring relatively labour-intensive operations that don't quite meet business requirements.

In this case, we are using a mixture of software tools, some of which are supplied with popular boxed database software, which means the customer already owns them, alongside more specialised tools from a global vendor. The mixture means we can contain the costs as far as possible while still delivering the business benefits that will ultimately deliver the best ROI for the customer.

While the issue of multiple data sources and lack of referential integrity is a primary cause behind BI mistrust, it is not an insurmountable obstacle. It does, however, require a fair level of education on the customer's part, particularly where there are several sponsors and divisions or departments that must buy into and support the project. The technical employees in South African businesses, from the CIO down, usually understand the process and the results that can be obtained, but they are seldom masters of their own budgets and never write their own requirements.

Share

Editorial contacts

Jeann'e Swart
Thought Bubble
(082) 539 6835
jeanne@thoughtbubble.co.za
Francois Cross
ITBusiness
francois.cross@itbsuienss.co.za