Subscribe
About

Enter the great authenticator – data’s new superhero

Data authentication in a world of disinformation could pave the way for a new, specialised role – that of the data authenticator.
Mervyn Mooi
By Mervyn Mooi, Director of Knowledge Integration Dynamics (KID) and represents the ICT services arm of the Thesele Group.
Johannesburg, 28 Jul 2022

According to IDC, around 80% of all global data will be unstructured data such as photos, videos, social media feeds and audio by 2025. Buried within these rising volumes of data lies valuable information which could be used to understand markets, customers, phenomena and trends.

However, the rapid spread of fake news and disinformation online illustrates that not all of the available information can be trusted. In fact, there are financial incentives for spreading disinformation: the Global Disinformation Index says brands unwittingly support disinformation websites to the tune of around $235 million a year.

What does this mean for organisations wanting to harness a wealth of unstructured internal and external data for decision intelligence?

Organisations will have to go to great lengths to ensure the data they depend on is properly authenticated. Data creators and consumers all rely on authenticated (genuine/validated/qualified/truthful) data or information in whatever they do.

On social media, where trolls proliferate and disinformation is rife, those carrying out sentiment analysis should be looking beyond likes, comments and complaints.

But how can we assure authenticated data? Organisations will need to emulate researchers in other fields, by triangulating data – or using multiple reference points to confirm the data’s validity.

Triangulating data

Every piece of data used to create a data product should be checked from at least three points/references of source. In truth verification techniques it is also called “corroboration“ or “substantiation”.

Here are some reference examples:

  • As a main or first reference point, for any particular piece of data (or the data product itself), the source can be identified from that data’s source device, machine, system process or service, in the form of an identification number, holding transaction code or data object tag, to mention a few. Obviously there would be a prior agreement between data/service provider and the user in place, with its own digital or “hand-shake” code – this coupled with a host of consumer-specific configuration and verification parameters that are set up for the process/service.
  • As a second point of reference, an agreed password or access code would be used to enable usage of the data service/product and which is also checked by the process/service upon execution.
  • As a third reference, the service agreement or licensing (where the source device, system or process is used) is also checked to ensure the correct consumer (subscriber of services) is enabled/allowed to use the correct data service.

From a quality perspective, the data is subjected to set validation and quality checking rules to ensure reliable data.

In unstructured content, the reference points for authentication would include date/time stamps, device serial numbers, location and pixelation consistency.

On social media, where trolls proliferate and disinformation is rife, those carrying out sentiment analysis should be looking beyond likes, comments and complaints, to checkpoints, references or inputs from other traditionally tried and tested sources to verify the content or data is accurate and trustable.

For example, should rail commuters complain their trains are late, authenticators could source inputs such as rail company reports and photos and videos from concerned citizens, to confirm that delays actually had occurred.

The role of the data authenticator

In organisations such as insurance companies relying on unstructured data – such as a photo of accident damage to process a claim – authenticators may have to analyse the metadata of every picture right down to the source of the information, the creation date, time, source location, device and whether it has been edited, checking if pixels are consistent; as well as potentially looking at external data such as weather statistics to confirm that a picture is genuine.

In organisations depending on large volumes of pictures, blogs, social media posts and news articles, authentication is a huge challenge. This necessitates a new job category – where technical skills converge with forensics, auditing and data analysis to authenticate data at scale.

Demand will increase for data scientists to provide algorithms to triangulate or corroborate checkpoints against other external reference points. As this authentication capability matures, organisations can be assured that the data they are basing important decisions on, can be trusted.

Likewise for individuals directly utilising data in their daily lives and for their belief systems − authentication applications (especially for mobile and the internet of things) would need to empower the individual consumer themselves with much the same authentication capabilities. 

Share