In my first column, I looked at the challenges that organisations face when it comes to collecting, storing and then using data. Significant as these challenges are, they must be overcome because data offers the opportunity to generate evidence-based insights that in turn support better, more rapid decision-making in a complex and ambiguous business environment.
All the challenges related to data stem from the fact that there is so much of it, and it is accumulating so rapidly. The vital first step here is data discovery: the capability to identify what data the organisation has, and to classify it accurately.
This is an ongoing process because not only is data constantly flowing into the organisation, its usefulness or otherwise is at least partially dependent on its age and context.
As previously argued, the volume of data means data discovery cannot be undertaken manually. Unsurprisingly, software has been developed to automate the process of data discovery; equally unsurprising, all these tools are not equal.
CIOs and CSOs need to understand what the components of an effective data discovery solution look like in order to make the right choice.
Any prospective tool should address five areas:
Discovery: The solution needs to be accurate, fast and scalable to deal with the significant and growing volumes of data. It needs to be able to identify sensitive data across multiple regions and in multiple languages.
Another key requirement is the ability to analyse the data in place, without the technical issues and expense of moving it into a central repository. It must therefore be able to connect seamlessly with existing data repositories and cope with all file types.
As part of the discovery process, the tool needs to classify the data and remediate it − which would include disposing of duplicate or redundant, obsolete and trivial (ROT) data, and then protect sensitive data via encryption.
Insight: In parallel, the tool must provide rich insights into the data to enable effective management. Organisations need to be able to create custom categories specific to pertinent regulations and/or business or industry processes. These categories are needed to enable accurate risk scores to be generated.
Having categorised the data, sensitive data needs to be quickly tagged so that it can be adequately protected in downstream business processes.
A key capability is the anonymisation of critical data sets, and the use of tokenisation and encryption to protect data during analysis.
Tagging also assists in the intelligent sampling of large data ecosystems to pinpoint the location and density of risky data.
Comprehensive reporting on the security permissions and access rights relating to the corporate data is required to enable security management to take the necessary actions to ensure sensitive data is not exposed.
Protection: Data governance is critical. Intelligent protection must be built into the tool − security is only effective if it is part of the structure and not an add-on. Throughout the discovery process, the solution needs to mask the data, and automate the protection of sensitive data based on risk and context.
A key capability is the anonymisation of critical data sets, and the use of tokenisation and encryption to protect data during analysis.
A related capability is the ability to provide data that can be used safely in application development and testing. In the so-called “app economy”, organisations must be able to develop and launch apps rapidly, and there is no room for misfires once the app is launched.
Testing is thus critical and is integral to the iterative Agile development methodology that is increasingly being used.
However, using appropriate data in testing has become problematic because of compliance concerns, and creating synthetic data that accurately mimics “real” data is time-consuming and could impact the testing process.
Choosing a tool that can mask real data to ensure compliance while making the testing process as thorough as possible makes a lot of sense, leading to a shorter and more predictable path between development and production.
Monitoring: The proposed tool must be able to run data discovery scans continuously as new data enters the enterprise. In addition, the existing policies relating to data life cycle management, access and so on need to be applied to new data.
The tool needs to provide a clear audit trail of all actions taken on data.
Management: Managing the data obviously occurs across all these areas, but it’s useful to consider it as a whole. Data management encompasses disposing of ROT or outdated data based on defendable grounds, and protecting the data while at rest or in motion and throughout its life cycle via encryption and anonymisation.
It’s also important that the tool enables managers to store data intelligently to reduce costs; for example, little-used data can be stored on cheaper media.
In similar vein, managers must be able to identify data associated with legacy applications for disposal. Overall, data must only be stored for as long as necessary in terms of regulations and consent parameters.
With an effective − and automated − data discovery process in place, enabled by a carefully chosen software tool, any organisation is all set to make the most of its data.
Share