It's pretty staggering. I sketched out my first data warehouse architecture almost 25 years ago as an internal IT project for IBM Europe. In those dark ages, I drew it on paper - so, of course, it's now lost forever. But you can still look up the first ever published complete data warehouse architecture. I wrote it up in the IBM Systems Journal (Vol. 27, No. 1) in 1988.
What amazes me is that basically the same pictures are still in use by data warehouse consultants, vendors and implementers today.
For sure, new components have been added, and if you draw the complete picture now it looks a bit spaghetti-like, but the basic workings remain the same: three data layers - operational, enterprise data warehouse and data marts, ETL-extract, transform and load, and a dollop of metadata on the side.
Considering how business needs have changed and technology has grown since, isn't it a bit strange that we still consider a 25-year-old architecture to be state of the art in IT?
After a couple of years of internal debate, I've recently decided to put my head above the parapet and declare that we need to do something radically different.
We need to expand our view beyond the traditional data and structures we've been stuffing into warehouses for the past 20 years.
Back then, all you wanted to know - or, perhaps, could ever hope to know - about your business came from your own operational applications.
Isn't it a bit strange that we still consider a 25-year-old architecture to be state of the art in IT?
Dr Barry Devlin is founder and principal, 9sight Consulting.
Today, we have data in clouds, information ecosystems with partners and customers, content on the Web and social networking information everywhere. And we must extract real knowledge, sometimes in almost real-time, from all of these sources for realistic decision-making.
Information today - internal and external, hard and soft, real-time and historical, personal and public - has a far broader scope than the limited set we considered when we first drew the three layers of the data warehouse architecture. Going forward, it is an absolute necessity to consider the complete “business information resource” in all its variety of forms, distributed location and ownership and range of timeliness. Three layers, especially as we now implement them physically, will no longer suffice.
Nailing the process
I've recently taken to asking at data warehouse conferences if there is a process that describes decision-making. It usually takes quite a bit of probing before the traditional data-oriented attendees can see it, but the reality is that there is a process - albeit flexible, adaptive, unrecorded - behind every decision made in business. This is important! When you recognise this, you see why business intelligence (BI) as it's done today reaches only a small percentage of the business.
And you can see how service-oriented architecture (SOA) comes creeping into BI from the operational world with its user-modifiable workflows and services that can be anything from “place an order” to “analyse today's sales”. We need to recognise that process is pervasive across all the activities that users undertake in the business. We need to respond to the business users' reality that the processes they follow day-in and day-out are inherently interconnected and must be seamlessly integrated by IT. There is no room for conceptual, technical or organisational barriers erected by IT.
Despite the very real conceptual, technical and organisational challenges created, it's time to begin to create a single information resource that underlies an integrated set of processes that serves all the business users equally. It's time to admit that, in the real world, each and every person likely has multiple roles - operational, informational and collaborative - in any business process in which they participate. They need a fully integrated view of all aspects of the information and processes relevant to their daily activities in the business.
Rest in peace?
I am sometimes asked if the data warehouse is dead. Personally, I don't think so - although I may be biased as one of its creators! However, I do think it is unconscious.
Unconscious of the threats from SOA and social network initiatives such as Enterprise (Web) 2.0 that are resetting users' expectations of how information is delivered and activities intermixed. Unconscious of its potentially central role as the repository of the common, core information of the business, the single truth around which all the other valid truths of the business must constellate. And unconscious of the skills that the data warehouse team can contribute to the widespread integration of information and process that will be the hallmark of successful businesses in the future.
The second coming of the data warehouse will be in its contribution to a new architecture I'm calling “Business Integrated Insight”.
I'd be delighted to share my insights on this over the coming months and to hear your views on the potential and practicality of such an approach.
Share