Do you still need a data warehouse?
More and more organizations are wondering what the use is of a data warehouse, and whether or not it’s worth the investment. They also want to know what alternatives are available. A growing number of IT vendors and some “experts” claim that the end of the data warehouse is nigh. When we say vendors, we’re referring to suppliers of data warehouse appliances, data virtualization tools, and data discovery tools. We have a different opinion, though. The data warehouse still is the beating heart of the Intelligent organization and it serves different vital goals.
1. Improving integration
An organization registers data in various systems which support the various business processes. In order to create an overall picture of business operations, customers, and suppliers – thus creating a single version of the truth – the data must come together in one place and be made compatible. Both external (from the environment) and internal data (from ERP, CRM and financial systems) should merge into the data warehouse and then be grouped.
2. Speeding up response times
The source systems are fully optimized in order to process many small transactions, such as orders, in a short time. Generating information about the performance of the organization only requires a few large ‘transactions’ in which large volumes of data are gathered and aggregated. The structure of a data warehouse is specifically designed to quickly analyze such large volumes of (big) data.
Read also: The 7 biggest Big Data pitfalls.
3. Faster and more flexible reporting
The structure of both data warehouses and data marts enables end users to report in a flexible manner and to quickly perform interactive analysis based on various predefined angles (dimensions). They may, for example, with a single mouse click jump from year level, to quarter, to month level, and quickly switch between the customer dimension and the product dimension, all while the indicator remains fixed. In this way, end users can actually juggle the data and thus quickly gain knowledge about business operations and performance indicators.
4. Recording changes to build history
Source systems don’t usually keep a history of certain data. For example, if a customer relocates or a product moves to a different product group, the (old) values will most likely be overwritten. This means they disappear from the system – or at least they’re very difficult to trace back. That’s a pity, because in order to generate reliable information, we actually need these old values, as users sometimes want to be able to look back in time. In other words: we want to be able to look at the organization’s performance from a historical perspective – in accordance with the organizational structure and product classifications of that time – instead of in the current context. A data warehouse ensures that data changes in the source system are recorded, which enables historical analysis.
5. Increasing data quality
Stakeholders and users frequently overestimate the quality of data in the source systems. Unfortunately, source systems quite often contain data of poor quality. When we use a data warehouse, we can greatly improve the data quality, either through – where possible – correcting the data while loading or by tackling the problem at its source.
6. Unburdening operational systems
By transferring data to a separate computer in order to analyze it, the operational system is unburdened.
Definition of data warehouse
According to Bill Inmon, “a data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.”
7. Unburdening the IT department
A data warehouse and Business Intelligence tools allow employees within the organization to create reports and perform analyses independently. However, an organization will first have to invest in order to set up the required infrastructure for that data warehouse and those BI tools. The following principle applies: the better the architecture is set up and developed, the more complex reports users can independently create. Obviously, users first need sufficient training and support, where necessary. Yet, what we see in practice is that many of the more complex reports end up being created by the IT department. This is mostly due to users lacking either the time or the knowledge. That’s why data literacy is an important factor. Another reason may be that the organization hasn’t put enough effort into developing the right architecture.
8. Increasing recognizability
Indicators are ‘prepared’ in the data warehouse. This allows users to create complex reports on, for example, returns on customers, or on service levels divided by month, customer group and country, in a few simple steps. In the source system this information only emerges when we manually perform a large number of actions and calculations. Using a data warehouse thus increases the recognizability of the information we require, provided that the data warehouse is set up based on the business.
Redundant data: rocking the boat?
Data should never be stored more than once?
A traditionally minded IT specialist will find this unacceptable: data should – within the company network – never be stored more than once so that when we change data we will not need to do this at several places. The fact that this principle benefits the maintainability of data is beyond dispute. Especially when we need to analyze large volumes of (unstructured) data: Big Data.
Do they have a valid argument?
At first glance, the IT specialists do have a valid argument, however there are many other reasons that actually justify redundancy of data within the corporate network. The main argument is that we actually need a copy if we want to be able to ‘freely’ analyze data – which can be a heavy burden on the computer -, without the operational system putting its cap down.
Many analyses require quite some calculating power from the computer. For example: in order to calculate the revenue of a pharmaceutical wholesaler per account manager, per quarter no less than 25 million rows need to ploughed through.
Operational processes are at risk
And that is not all: the data still need to be added up and grouped per account manager, per quarter to then be presented in a report. When we perform such analyses on the source system – for example, the ERP system itself – the organization’s operational process is very much at risk: the order processing process proceeds much slower or stops altogether.
9. Increasing findability
When we create a data warehouse, we make sure that users can easily access the meaning of data. (In the source system, these meanings are either non-existent or poorly accessible.) With a data warehouse, users can find data more quickly, and thus establish information and knowledge faster.
All the goals of the data warehouse serve the aims of Business Intelligence: making better decisions faster at all levels within the organization and even across organizational boundaries.