In the world of data management, a relatively recent theme that is gaining greater popularity is that of data observability.
The discipline of data quality has been with us for decades, yet it remains one of the biggest challenges most organisations still face to this day, and its significance will not diminish any time soon.
However, as data management evolves together with new trends toward more modern architectures, data quality alone is no longer sufficient.
Data quality is a more “traditional” approach which focuses on data’s fit for purpose, or more correctly, purposes, across the organisation, by assessing it while at rest for compliance against well-defined rules.
Having identified issues, typically in a batch type paradigm, various remedial actions are implemented and, ideally, these should be tracked and measured and the end results thereof re-assessed, by applying the same rules to the remediated data sets.
Data observability is intended to identify potential data issues proactively.
Data observability, on the other hand, continuously monitors data in real-time within data pipelines and systems, with a view of proactively detecting and addressing potential issues. These are typically thrown up as alerts to be resolved via well-defined work flows and collaboration, ideally with a focus on root cause analysis to prevent future re-occurrences.
Data observability is intended to identify potential data issues proactively. It will automatically pick up any anomalies; for example, any values that are above the thresholds that have been set, and report them to people who can take care of it, such as the data ops team. Root cause analysis would then be carried out to make sure it doesn't happen in the future.
Key roles
Data observability is particularly suited to the modern trend of a shift to more decentralised data management, or federated management frameworks, such as data mesh, or to architectures such as data fabric, as it can provide a complete view over the entire data ecosystem.
Data quality typically focuses on specific data sets, or domains across sets, typically in persisted data stores, be they operational or analytical.
At the same time, data observability can play an important role in helping ensure data is fit for purpose prior to hitting data storage, thus building quality into the process and enabling a move closer to quality assurance rather than mere quality control. Over time, this drives continuous improvement along all points of the data ecosystem.
It is important to bear in mind that data quality and data observability are highly complementary approaches, and ideally both should be applied. Data quality techniques are by now well-known and relatively easy to apply, while data observability can be more complex to implement, but in the end, both are needed to holistically contribute to an organisation’s ongoing quest for the continuous improvement of data quality.
Embracing observability
In South Africa, many organisations are aware of data observability and see the potential value of it, but few seem to have adopted it yet. This is largely because it is a relatively new concept, and it does require quite specific technology, such as data quality tools with observability features built in.
Organisations looking to harness AI should focus on both data quality and data observability to improve outputs. It’s the old ‘garbage in, garbage out’ scenario; however, in the AI era, with increasing volumes and complexities, the speed at which data quality can be addressed becomes more important.
On the flip side, AI also plays an important role in supporting data observability, because with increasing volumes and complexities, a lot of what would otherwise require people to manually map, analyse and remediate, needs to be done automatically and much faster.
To improve data quality with observability, organisations should be looking to acquire technology which enables implementing data observability into their data ecosystems.
Some of the main features such a tool should have include anomaly detection, schema validation, volume monitoring, freshness monitoring, data quality validation, alerting, remediation and incident management, root cause analysis, data lineage, broad and varied integration capabilities, performance monitoring, and reporting and dashboards.
Taken together, applying both traditional data quality techniques as well as data observability will ultimately ensure a finely-tuned, well-managed and reliable data environment, leading to more effective business operations and game-changing analytics.
Share