I often wonder if database administrators (DBA) really understand the vital and cardinal differences between online transaction processing (OLTP) databases and data warehouse (DW) databases. The reality is, there are major differences between the two and it takes a completely different kind of approach to successfully administer and manage each one.
The objectives of these two databases differ dramatically. The one must ensure permanent uptime with 100% data integrity and recoverability to the nearest second having a couple hundred gigabytes of data in size. The other must do all of this, as well as process terabytes of data, per day, within a database that delivers a solution from business specific requirements to a specific user base.
A DBA needs to understand these differences and the changes that need to take place to shift from a transactional procession database to a DW database. A challenge here, however, is to balance the database resources required to optimally load hundreds of millions of records per day and at the same time, have users sorting and scanning through massive record sets within predefined response times.
Getting it right
Unfortunately, the lack of understanding around this and the data warehouse concept and methodology is often the main reason why database administrators fail to successfully maintain large DW databases.
A few areas that are often not understood include:
* The amount of data that will be loaded into the data warehouse.
* The concept of bulk loads and massive reads.
* The hardware resources required to be available to do massive parallel processing.
* The importance of storage disc management.
* Suitable hardware and software configurations.
These challenges and misunderstandings can be overcome if the basics are understood. Most databases have some differences in the DW options available, but all of them have the same underlying structures to store and read the data.
The concept of data, index, log and system files are also used in most large databases, but these files must be accessed simultaneously. If the data resides on the same disc then there will be performance issues as it has to read and write at the same time using the same disc controller and the same access path. Addressing this simply means using a previously proven method of having parallel access paths with many smaller disks striped together to increase the amount of controllers.
Additionally, one of the most underestimated concepts in the usage of table and index is partitioning. There are many different parameters that can be used on a database and table level, but the use of partitions makes one of the biggest differences to data loading and retrieval.
It is of utmost importance that an administrator understands the volumes of data, as well as the grain of the data to be able to decide on a partitioning strategy - even before the extract, transform and load processes are developed.
Decoding differences
A main performance enhancer with regards to data is the database parameter settings where specific DW settings are added to enhance the performance, and to automatically change the way the database optimiser will handle large queries. It must be noted, however, that these settings must not be changed without properly investigating the effects such changes will have.
In order for effective management of the different types of data and their respective processes to take place, the real database administrator needs to stand up and step up to the plate.
Ren'e Muiyser is a principal BI consultant at PBT, Cape Town.
Although there are several courses available for database administrators to assist in understanding the differences in the above, most concentrate solely on managing OLTP databases. While courses and information sessions for new database features are included with every release, this information does not include how to use the features based on best practices, highlighting the differences between OLTP and DW concepts, and the corresponding correct parameter usage and storage configuration set-ups.
In order for effective management of the different types of data and their respective processes to take place, the real database administrator needs to stand up and step up to the plate. This will also allow for knowledge sharing to take place among administrators, building a platform for effective learning and ensuring the differences between databases are understood going forward.
It is obvious that there are certain differences between these two databases and the need to understand these differences is crucial for the correct implementation and success of the databases to occur.
It is imperative that data administrators acknowledge these differences and understand the implications for implementation and management, in order for successes in this area to be achieved, especially as data warehousing forms a core part of an organisation looking to use data to achieve organisational success.
* Ren'e Muiyser is a principal BI consultant at PBT, Cape Town.
Share