On the path towards maturity in the field of metadata, an organization can go through four levels of ambition. These are shown in the figure below.
Figure: The four levels of ambition of metadata
- Level 0: an organization is on this level if there is hardly any need for metadata, for example because all applications store data in one and the same database from which they also retrieve the data. The applications are more or less isolated, which means that hardly any data is exchanged between them. The organization is structured as such and managers primarily focus on their own departments.
- Level 1: when organizations develop and build a data warehouse in order to improve information supply and performance, they will ultimately reach level two. However, during the development stage, the need for metadata from the source systems will arise. After all, during such development projects, it is important that we define key indicators and dimensions. (Which entities and attributes do we use to make calculations and what is their meaning?)
- Level 2: now that we own a data warehouse, we not only have to deal with redundant data but also with redundant metadata. For pragmatic reasons, organizations often choose to build a data warehouse first in order to satisfy the information needs and to build the so-called metadata repository – a database with metadata – at a later stage. This database containing metadata collects all possible ‘data about data’ from as many information systems as possible: data warehouses, Business Intelligence systems, ERP and CRM systems and so on. On this level, employees primarily use the metadata repository to query the meaning of data and information. Administrators and developers primarily use the repository to perform impact analysis (which systems uses what attributes) in order to be able to assess the impact of metadata changes.
- Level 3: effective implementation of the changes in the database itself takes place in stage 3, but not before bi-directional exchange of metadata takes place between the metadata database and the connected systems. On this level, we only apply the changes to the bottom layer (the data layer) of the three-layer-model . The software – the interface and the application layer – do not yet ‘automatically’ adapt to these changes.
- Level 4: on this level, the information systems within the organization are being developed, checked and managed by a database that contains metadata. This is referred to as a model-driven architecture (MDA). Changes in this database will ‘automatically’ be applied to all underlying systems and in all three layers of the model. Application development is then highly metadata-driven and data integration between various applications (EAI) takes place based on the metadata repository. Nowadays, data warehouses are increasingly developed based on metadata.
A metadata repository is in fact a metadata warehouse using which we can perform metadata analysis such as impact analysis and data-lineages. Impact analysis provide insight into the impact of proposed changes in systems. If we, for example, want to remove an attribute from a table, the impact analysis indicates which tables, procedures or applications also need to be adjusted. Data lineages show the origin of attributes and tables in a target system – for example, a report from the data warehouse – often via a range of derivations and transformations. In this way, we can figure out that the attribute ‘revenue’ is a derivation of two attributes from the order detail table in the ordering system. Furthermore, a metadata repository prevents the ‘data warehouse’ from becoming ‘housed data’. In other words: the repository helps data warehouse users to find the data and information they require. Finally, a metadata repository often provides possibilities to analyze the frequency of use of data and information, thus: how often do we use the data and who uses it.
Organizations will go through each level at their own pace. Some organizations operate in a highly dynamic environment in which change is often a necessity. These organizations are likely to decide to develop a data warehouse (level 1) sooner rather than later. In a later stage, they can be expected to make new information systems metadata driven. After all, metadata-driven development is faster and being fast is a necessity in a highly dynamic environment – that requires quick response to market developments.
When we place these levels of ambition in the context of the Business Intelligence cycle, we see that level 0 focuses primarily on registering, that level 1 and 2 focus primarily on processing and that levels 3 and 4 focus particularly on (quick) response. Ultimately, metadata should support all processes of the major and the minor Business Intelligence cycle so that we can go through them smoothly and quickly.