AI: The top trend shaping data analytics in 2024

By Janco van Niekerk, Data scientist in the KID Group.

Johannesburg, 12 Aug 2024

Janco van Niekerk, data scientist at KID.

Artificial intelligence (AI) – and specifically large language models (LLMs), or generative AI (GenAI) – are the next big disruptors in the world of data science.

While LLMs like ChatGPT may not have been widely adopted in South African data science yet, the technology is already causing ripples, as organisations and their data science teams review its potential and use cases.

There will undoubtedly be disruptions ahead, and the way we work will change drastically. Data scientists will have to learn to master the tools and ask the right questions of the technology, keeping ahead of new approaches and strategies.

While LLM technology is remarkable, it’s not yet good at everything. It is prone to hallucinations and doesn’t excel at coding complex systems with many integrations. For high-risk decision-making and sensitive engagements with customers, it cannot be allowed to operate unsupervised.

This may mean organisations will have to build machine learning models to ‘supervise’ certain responses from LLMs. And before they even consider these moves, they will have to ensure their data will support the LLM and machine learning (ML) tools they deploy.

While LLM technology is remarkable, it’s not yet good at everything.

AI-powered analytics will most definitely become popular in the years to come. Code-assist AI tools will make it quicker to construct and execute queries to retrieve the relevant information from structured data. This also means data skills will be democratised by giving non-experts convenient access to processed data by means of natural language querying.

Adoption of GenAI LLMs could allow a business to conveniently access, process and interpret structured and unstructured data. For example, an employee can ask such a system a question where an LLM can construct and execute a database query to retrieve quantitative information and combine this with meeting notes to conclude how certain actions correlate with KPIs.

In another example, the business could use an LLM to query quantitative sales information in a table and combine it with qualitative information such as notes from all sales meetings, to map how the weekly sales change as new sales strategies are implemented. It can be used to combine quantitative and qualitative information in a user-friendly way that was almost impossible to do in the past.

This is a new paradigm in terms of analytics which can provide immense value – although there are many complexities that businesses will still need to figure out.

Use cases

AI/ML can be used in a variety of different contexts. One of these is automating tasks which are traditionally meant for humans and could not be automated using a traditional software approach.

Automating these human tasks using AI can be done in various business functions, but the technology function often gets automated first. While other industries are concerned their role might be automated by AI developers, developers themselves have an attitude of “we will willingly automate our own roles”.

Tech professionals are generally the people most familiar with new technologies and can easily identify opportunities to increase their productivity and effectiveness.

This is evident when using AI-code assist tools and even ChatGPT where there is a clear emphasis on supplying answers with well-formatted code-snippets and correct syntax.

GenAI tools can also be used for data labelling without being explicitly trained to perform this exact labelling task. This is a task which has been traditionally done by humans, requiring a vast amount of effort and time.

Data labelling allows businesses to extract more meaningful information from their data, which can then be used to make optimal decisions. Automating this process will mean data labelling becomes much more cost-effective and efficient.

Levelling the playing field

Research in machine learning has, for a long time, focused on improving algorithms. Some notable researchers, such as Andrew Ng, have advocated for a more ‘data-centric’ approach to machine learning and provided convincing arguments around why better data equals better models.

This means that, instead of focusing on improved algorithms, we should focus on improved data (which can then be used to train these various algorithms).

The downside of this is that companies that are larger, more established and have collected better data have an asymmetric competitive advantage over smaller companies with lower-quality data.

However, with the use of GenAI and data labelling, the available data can be enriched, which means machine learning algorithms can benefit greatly from this additional information.

This levels the field for newer and smaller companies and means the accuracy of ML models now also depends on ‘asking good questions’, labelling data and training models using this approach. This works by leveraging the extraction of information in a desired format which is ‘contained in’ pre-trained LLMs.

As more text data is generated, companies will leverage various natural language processing techniques across their businesses. As technology improves, natural language processing techniques are becoming easier to implement and more effective at certain tasks.

Technologies such as chatbots, sentiment analysis and vector databases are simplifying the process of leveraging text data in business processes. Chatbots on company websites are proliferating, and I anticipate this trend to continue and improve as the underlying technology develops.

AI: The top trend shaping data analytics in 2024

Data scientists will have to learn to ask the right questions of AI technology, keeping ahead of the latest approaches and strategies.

Use cases

Levelling the playing field