Microsoft launched its flagship data product, Azure Synapse Analytics, in 2020, and as the product matured, intense interest has grown from both large South African corporates as well as smaller businesses looking for an edge over their competitors. But organisations have been building data warehouses since the 90s to provide insights on cost reduction, performance improvement and customer behaviour – so why the sudden interest in building new data platforms in the cloud?
The answer lies in the capabilities of these new cloud data platforms. If we compare what is possible now compared to just a few years ago, we can see that many of the issues that have dogged companies with traditional data warehouses can be resolved with Azure Synapse.
BI versus analytics
Traditionally, data warehouses were built to focus on historical data, consolidated from a variety of source systems, structured for high-performance querying, but mostly looking back in time. This was great for providing context and understanding why performance was good or bad. However, if you needed more advanced capabilities, such as statistical analysis or predictive forecasts, you often had to move large sets of data from the data warehouse into another toolset, such as SAS.
Azure Synapse, like other cloud data platforms, changes that paradigm. As well as being able to use traditional BI modelling tools, you can also use modern open source tools like Python to build models directly on the various data sets that Synapse accesses. For example, you can use the Azure Synapse Studio notebook to build a customer churn model in Python that accesses data natively from a Synapse SQL pool, as well as an Azure Data Lake source. No more moving data around from the BI warehouse to other data repositories!
Dealing with big data
Traditional data warehouses were great at storing structured, SQL data. But they needed a lot of management and didn’t cope well with the massive storage demands of big data as well as the unexpected changes in format and structure of new data sets. Data lakes sprang up as an attempt to deal with big data demands, and tools such as Hive, Sqoop and other Hadoop add-ons were built to try and help developers keep up with the sudden demands. But learning to work with these different tools was challenging, as were the skills needed to deploy big data platforms.
Azure Data Lake was part of the movement to make Hadoop and associated tools easier to work with, but Microsoft’s first-generation product was still limited to query tools like Hive. But now, Azure Data Lake Generation 2 combined with Azure Synapse allows developers to work easily with both structured SQL data as well as unknown file structures in the data lake. For example, a single SQL query in Synapse can combine data from unlimited files in a data lake folder with similar structures and return the consolidated results. A traditional SQL developer needs to learn only a few extensions to deal with working with data in the lake, but essentially, they can treat files in the data lake like tables in the data warehouse. It’s a massive step forward for productivity.
Performance versus cost
To get high performance in the traditional on-premises data warehouse environment, you had to buy specialised kit – such as Terradata. Microsoft even had its own data appliance, the PDW – Parallel Data Warehouse, which was a massively parallel data processing solution that ran on licensed hardware. But if you needed to upgrade the performance again, or add new storage – costs were often prohibitive. These days, cloud data platforms such as Azure Synapse provide a much more attractive solution – the high performance that end-users demand, but with the flexibility to reduce, or increase, computational power on demand. Of course, this can be an endless balancing act – ensuring the optimum performance is gained for the right cost. Regardless, the days of buying more hardware than you want today because you might need it in three years’ time are long gone.
Does this mean that Azure Synapse will save you money? Not always – some organisations might decide that the benefits of providing near-instant results for users are too tempting to ignore, and go for higher Synapse tiers, with the associated higher costs. But at the end of the day, the costs can be controlled and the performance benefits compared to traditional SQL are compelling.
Azure Synapse Analytics is not the only modern cloud data platform in the marketplace, but Microsoft has spent significant amounts of money to ensure it is a strong competitor for top place in the data race. If you already believe that the Microsoft cloud is the platform for your organisation going into the future, then you must look at whether Synapse could also fit into that future.
Share