Shared data has become the norm across all sectors, but as more and more sensitive data is shared across ever-larger data lakes, individuals and organisations alike are losing control of who can access that data, and how it is stored and used.
There is a compelling argument for sharing data: it can improve processes, research and analysis, it can enable innovation and productivity, and it can deliver personalisation and convenience for consumers.
But these benefits come at a cost: of necessity, vast volumes of data are shared into central repositories and then shared out again across individuals, departments and organisations.
In recent years, the data lake architecture has become a de facto standard within almost every organisation, with the goal of consolidating data from siloed applications and through this consolidation onto a single platform, to drive data insights. Individuals are contributing data to the repositories and sharing data out again. In this dynamic environment, there is always the risk that control will be lost.
In this connected world, data sharing has become the norm − and necessary. However, from my experience, few organisations have sufficiently mature processes and technology in place to ensure the data being shared is not inadvertently exposed or misused – by either internal or external parties.
Prime examples of shared data exposed to significant risk are the Experian breach and – most recently – the TransUnion hack.
While most organisations take cyber security and compliance seriously, they do appear to be lagging in terms of solid data protection.
Despite what we assume should have been layers of protection in place, access to the data lake appears to have exposed sensitive data relating to tens of millions of people. Had this data been encrypted, masked or obfuscated, it would have remained somewhat protected, even if a server was breached.
Shared data also increases the risk of exposure through error, or deliberate fraud committed by internal parties. Staff with access to customer accounts and cash vouchers could skim funds, for example, and staff with access to identity document numbers and actual copies of IDs could use these to commit identity fraud.
Protection at data level
Multiple layers of cyber security should be in place to ensure protection of sensitive data, across the cloud, network and/or server infrastructure. But should these defences be breached, the next line of defence is at the actual data level.
In order to protect any data that has been shared, an important first step is to understand what sensitive data resides where. This can be achieved by utilising a data catalogue, as it has the ability to categorise data into domains such as PII, PCI, or non-sensitive.
Once these are identified, the next step is to define who has access to this data and whether they are authorised to access it.
Should the appropriate authorisation not be in place, then the data needs to be protected from exposure to that particular individual or group.
However, data protection measures should be applied in a manner that does not hinder data sharing, since the ability to share is a critical requirement in all organisations. These measures should be deployed unobtrusively at the application layer, where most of the access occurs.
These applications should not be limited to front-end only, but also deployed in back-end databases and storage layers in a uniform and consistent manner to ensure all security policies are automatically propagated on access of the data. Encryption, data obfuscation and/or masking are needed to ensure data can be shared to enable collaboration and productivity – without putting sensitive data into the wrong hands.
Data sharing also occurs during the development phase of many applications and it is equally important to protect data during this process, by applying data protection to data at rest. What this means is that should a production source of data be copied to a development storage layer, this data needs to be protected at the point at which the copy is made; ie, masked statically.
While most organisations take cyber security and compliance seriously, they do appear to be lagging in terms of solid data protection. Shared data is data at risk, with multiple points in the data lifecycle when it needs enhanced protection.
Organisations need to take adequate measures to protect data throughout the lifecycle: they need to make it a priority to understand what data they hold in their central repositories, how it is shared, and how best it can be protected – even if their infrastructure defences are breached.
With effective data governance and protection throughout its lifecycle, and no matter how the data is shared, organisations will be better positioned to derive real value from their data – benefiting from shared data in analytics, for operational and business improvements, and to enhance customer experience – without the risk.
Share