Business Intelligence only works well when we regularly retrieve data from the source systems and copy it to a separate computer and database. This means that the data from the source system are stored redundantly: in the source system and in the data warehouse.
Data should never be stored more than once?
A traditionally minded IT specialist will find this unacceptable: data should – within the company network – never be stored more than once so that when we change data we will not need to do this at several places. The fact that this principle benefits the maintainability of data is beyond dispute. Especially when we need to analyze large volumes of (unstructured) data: Big Data.
Do they have a valid argument?
At first glance, the IT specialists do have a valid argument, however there are many other reasons that actually justify redundancy of data within the corporate network. The main argument is that we actually need a copy if we want to be able to ‘freely’ analyze data – which can be a heavy burden on the computer -, without the operational system putting its cap down.
Many analyses require quite some calculating power from the computer. For example: in order to calculate the revenue of a pharmaceutical wholesaler per account manager, per quarter no less than 25 million rows need to ploughed through.
Operational processes are at risk
And that is not all: the data still need to be added up and grouped per account manager, per quarter to then be presented in a report. When we perform such analyses on the source system – for example, the ERP system itself – the organization’s operational process is very much at risk: the order processing process proceeds much slower or stops altogether.