Subscribe
About

Big data calls for data scientists

By Cathleen O'Grady
Johannesburg, 08 Aug 2013

The growth in the volume of data available to companies necessitates the need for data scientists, says Chris O'Connell, MD at BITanium.

Regular users are not able to pick up what data scientists will see, argues O'Connell. In a small or medium-sized business, an ordinary person can comprehend a certain level of data, he says, "but as soon as you're a retailer or a telco, where you're now talking about millions of records per day, there's no way you can absorb that information. You will be able, as an end-user, to know your few product lines, but even the connections between those become blurred."

He asserts that business intelligence software designed for the end-user can reach a certain point, but no further, because it does not assist the user in asking important questions. "The problem is, users very often ask the questions they think they know the answers to - but what about the question that remains unasked? That's where the data scientist comes in: to find patterns that you don't see. If you don't see it, you're not going to ask for it.

"If you look at big, structured data, it's about uncovering patterns the human mind wouldn't uncover," he continues. "Data science is about applying pure maths to data. I've seen data scientists run models to find interesting patterns, and that's where having the maths, stats and understanding is so important. Actuaries and statisticians find the reason for the outliers, then apply it back into their models to see what's going on. Your regular business user doesn't have the time, background knowledge or inclination to do this."

Data scientists can also properly implement predictive modelling, adds O'Connell. When a rigorous predictive methodology is used, part of the available data is used to build a predictive model, and the predictions are tested against the remaining data to ensure accuracy. The model is then refined until it can be used confidently.

"It's not something that a run-of-the-mill user would do," he emphasises. "It's a specialist field. A lot of the work that companies do now is really just data preparation and data collection. Applying the predictive models requires an understanding of predictive modelling; you need to have the statistical knowledge. Regular BI reporting is going to tell you what happened yesterday - it's not going to give you the patterns going forward, and if you miss the statistical patterns, that's where you're going to miss out."

Automated software can take a person so far, he concludes - but true insight only comes from a deeper understanding. "Like automating anything, you still need to understand the mechanics. I can now give you the standard deviation automatically. So? What does it mean? And unless you understand what it means, you're not going to be able to deal with the data."

Share