Do we still need a data warehouse? Are there not better ways of doing this? Can I not achieve a similar result, faster and better, for less?
Many organisations have been asking these questions, and evaluating alternatives to a data warehouse, for some time now. It’s not a bad question to ask, and I do encourage everyone to periodically re-evaluate their approach and methodology, to maintain relevance.
Often, however, people go down a rabbit-hole of technology evaluation and selection. They evaluate many different options in asking this question, such as in-memory databases, enterprise information integration, virtualisation, OLAP-styled BI-as-a-service, and cloud-based platform-as-a-service offerings such as Big Query. “With this technology, we can do everything that a data warehouse can do!” is something that I have heard quite often.
Many vendors encourage this thinking, to sell their products and services. Sadly, I do not believe this is the best advice or approach. Many organisations go down this route, only to discover they still have fundamental problems. They realise the technology on its own did not solve their problems, or give them the capability they were expecting. And with this, many organisations realise too late (and some never do), that the technology does not replace these requirements.
When building a data warehouse (DWH), we consolidate, conform, and standardise data and business rules. We also provide an historical view of data that is traditionally not available in source systems. It is true that not all BI requires a DWH. In many cases, use cases were forced through a DWH, where an operational data store (ODS) could have worked just as well.
Many technology alternatives promise to deliver the capability of a data warehouse, in memory, virtually, without having to move the data from the source. They often promise to connect the user directly to the underlying source data, or to load it into a single big data repository and provide all BI capability “in-flight”. This then creates the illusion that the concept of the data warehouse, in terms of what it does, is replaceable.
Take as an example, a company that has multiple policy administration systems for different product lines. The source system owners won’t allow OLAP directly on the production system. The need for a solution starts at an individual line-of-business (LOB) level, but soon enough, the need expands to show a consolidated set of metrics across all LOBs, to give an enterprise-wide operational view.
The industry even blended the concept of a data warehouse and a data lake to create the concept of a data lake house.
For example, the classic waterfall report showcasing conversion ratios. To do this at an enterprise level, one would need to consolidate the data. The definition of application statuses, product groupings, agent skill levels, etc, then need to be standardised across all source systems. We also need to potentially standardise and conform business rules across different LOBs, for data transformation, reporting logic, etc. Finally, we need to preserve the history, to be able to view the report at a point in time.
We can certainly adopt alternatives to a data warehouse to accomplish this but keep these requirements firmly in mind when doing this. Some alternatives have emerged, that are more relevant than others. An ODS was one of the first alternatives considered. After that, the concept of a data lake, and a data vault also became popular. The industry even blended the concept of a data warehouse and a data lake to create the concept of a data lake house. One can even choose no data warehouse, replacing this with a series of custom solutions that are use case specific in design.
All of these are valid potential alternatives, in some form, to a data warehouse. The thing is though, none of these are technologies, or platforms. They are all distinct methodologies, with a clear design paradigm and implementation approach, and they each do very specific things, and solve very specific problems. And yes, they also each have their own very specific weaknesses.
I encourage everyone to remember that a data warehouse is fundamentally technology-agnostic. Ralph Kimball and Bill Inmon implemented data warehouses on numerous different types of technologies. Their definition of a data warehouse, their approach, their methodologies, were not limited to a specific vendor or type of technology.
Real alternatives to data warehousing speak to a design pattern, a methodology for managing and using data, to deliver on diverse information requirements. When evaluating alternatives to a data warehouse, if the alternative does not represent a clear approach and methodology for this, and is technology-heavy, then take a step back and consider if you are not going down this rabbit-hole.
Keep an eye out for my next article, where I will discuss the real alternatives of a data warehouse in more detail. I will evaluate their strengths and weaknesses, as well as where and how I would advise using them.
Share