Subscribe
About

Lashings of delicious data

Companies have a false sense of confidence in large quantities of data.

Jessie Rudd
By Jessie Rudd, Technical business analyst at PBT Group
Johannesburg, 03 Mar 2014

With the arrival of big data and our wholehearted embrace of it, the adage bigger is better has truly come into its own. Our ability to analyse terabyte upon terabyte of data, fast, really is terribly impressive. We march off to business and convince them with true earnestness in our hearts that we can solve all their problems if we just analyse all the data available to us. Surely bigger and more data will reveal more and better insight? Surely?

No - adding more data is not a panacea. Being thoughtful about what needs to be studied, the why of it all, and the careful selection of the data that is relevant to those objectives, will produce much better results in the end.

Big data has inherent faults. Its very nature is massive, but unruly, untidy and biased. It is often assumed the data being presented or looked at is factual and absolute, while in reality, it is without organisation and context. When big data first arrived, with its 'massive potential for business' bang, the voices of the sceptical and cautious were drowned out by the clamouring of the all-embracing and ever eager masses. However, over time, their cautions and warnings are slowly being proven true.

Predisposition

Dan Ness, principal research analyst at MetaFacts, says: "A lot of big data today is biased and missing context, as it's based on convenience samples or subsets. We're seeing valiant, yet misguided attempts to apply the deep datasets to things that have limited relevance or applicability. They're being stretched to answer the wrong questions." (1)

To an extent, the experts have given business the illusion that plugging the algorithm they have come up with into data will alwaysprovide meaningful and accurate results. Assuming the algorithm is correct, which means it has to be assumed that the assumptions that went into the algorithm are correct. And everyone knows what they say about assumptions!

This false sense of confidence that companies have in data grows in proportion to the size of the sets being used. How scary is that thought? As a direct result of dealing with such massive sets, companies are becoming more and more prone to signal error and confirmation bias. Signal error is when large swathes of data have been purposely unused and unutilised by analysts. Confirmation bias is a phenomena whereby people will search within data to confirm their own pre-existing viewpoints or biases, and will thereby completely disregard anything that goes against their previously held position. Basically, they find what they are looking for.

Noise pollution

The more data that is thrown at the algorithms, the more noise gets generated that obscures the signal companies are trying so desperately to detect.

That said - narrow-mindedness and eagerness to please is not the only problem being faced. As best described by Marcia Richards Suelzer, senior analyst at Wolters Kluwer: "We can now make catastrophic miscalculations in nanoseconds and broadcast them universally. We have lost the balance in 'lag time'." (2)

Big data has inherent faults.

In essence, not only are the experts misleading business to a degree, their ability to create damage to business is greatly magnified because of the enhanced technology, global interconnectivity, and huge data sizes.

So, how do companies survive all the pitfalls presented and the speed with which the mistakes can go global?

1. Be vigilant and approach every single dataset with scepticism. The only assumption that can be safely made is that there are going to be flaws in the data. Much like death or taxes, data always has flaws.
2. Data is a tool we can use to get to meaningful answers. It is not the answer. Never let it do the thinking for you or rob you of your common sense. Be wise.
3. Having tons of data is not a bad thing, as long as there is the means to accurately and effectively interpret it for use by business. The better the tools, the better the results. Better analysis and better analytical tools are a must.

That being said, the best tool of all is the brain. Human logic and training is invaluable in the accurate analysis of big data. Without it, no matter the tool, a losing battle is being fought. Work smart and the payoff promised by big data can be the reward.

(1) http://www.datanami.com/datanami/2012-07-30/pew_points_to_troubles_ahead_for_big_data.html?page=3
(2) http://www.kdnuggets.com/2012/07/10-predictions-about-big-data.html

Share