Collect, filter, cleanse, combine, transform and aggregate
The aim of the collection process in the Business Intelligence cycle is to collect, filter, cleanse, combine, transform and aggregate data from different internal and external sources in order to increase the likelihood of good management information and useful knowledge during the process of analysis.
Usually supported by data warehousing
This process is usually supported by a data warehouse, which includes a specific architecture that fits in with both the information needs and the technical infrastructure. The data warehouse must ensure that data is transformed into information and knowledge that encourages action.
For example to realize an integrated customer view in order to better serve customers, or to get a better grip on the relation between either marketing and sales processes or sales and product life cycles.
A number of different steps
Due to its complexity, this transformation process does not happen overnight and therefore consists of a number of different steps. These steps – and the required components – are depicted in the overview of components of Businses Intelligence systems.
- Extract, filter and cleanse data in the SA: both internal and external data is transferred from the sources to a temporary storage space, the so-called data staging area (SA). Here the data is filtered and checked on quality. After the data load has been processed in the data warehouse, the SA is emptied again.
- Integrate, transform and load data in the Central Data Warehouse: extracted data from the different sources and tables is aligned and we make sure that the data is converted into a different format that better suits analysis and reporting purposes.
- Aggregate and combine data in data marts (DM) and cubes: in order to respectively increase response times and ease of use, we create much sought-after summaries and compound indicators.
- Copy data to an ODS: in some cases, we require an operational data store (ODS) for up to date analyses. Data in the same format – and checked on quality – is copied from the SA to the ODS. Sometimes the ODS is filled with data directly from the source system.
These steps – and the tasks and functions included – form the basis for the collection process and – as a group – are often referred to as ETL. ETL is an acronym for Extract, Transform and Load. Data from source systems are transformed and integrated in order to increase the chances to actually find information during the analysis process.
Make pragmatic choices
The steps represent the ideal architecture and process sequence of the Intelligent Organization. In practice, organizations will have to make pragmatic choices as to during which step they wish execute and solve things. For example, some organizations do not set up a Central Data Warehouse according to a dimensional structure (yet), but choose to move this bit to data marts instead. This may be necessary if the conversion from a relational structure to a dimensional structure is highly complex and cannot be achieved (or solved) in one step.
However, we need to be aware of the fact that – ultimately – each step must be performed somewhere in the architecture in order to prevent end users from having to tie things together while creating reports or performing analyses.