The pros and cons of a Data Vault
A Data Vault is a modeling technique for the CDW, designed by Dan Linstedt, which chooses to store all incoming transactions regardless of whether the details are in fact trustworthy and correct: “100% of the data 100% of the time”.
to the knowledge base
A modeling technique for central data warehouse
It’s all about transactions
For example: a sales transaction has already taken place, but the corresponding customer does not yet exist in the CRM system. The sales transaction can nonetheless be stored in the CRM system. When the customer becomes known to the system, the transaction changes from a ‘meaningless’ fact into a useful ‘truth’ because now its context is known.
Data Vault keeps track of history
The Data Vault keeps a history for each table field and an ingenious construction of hubs, links and satellites ensures enormous flexibility in storing data. The CDW is loaded much faster since different aspects can be processed simultaneously, in parallel. When we use a Data Vault, the CDW does not have a dimensional structure. That stage comes later, namely when we build the data marts or cubes from the Data Vault. Overall, the Data Vault concept provides a different outlook on both modeling and the architecture of Business Intelligence.
What are the real benefits of a Data Vault?
Question is: what are the real benefits. Moreover, does the Data Vault have any disadvantages? Most noticeable is that the Data Vault distinguishes between facts and the truth, which can be useful in order not to lose transactions and is in fact often necessary from the perspective of compliance. However, does it actually make sense to include a transaction in a report (or analysis) if it is not truly honest?
It requires more time
Creating a Data Vault seems to be complex and probably requires more time, particularly because it remains to be seen whether available ETL software solutions will in fact support the standard Data Vault (see the Data Vault discussion). The same applies to translating hubs and satellites into data marts and cubes. It is simply more difficult.
One version of the truth
Another question: how do we ensure that we do not develop more than one version of the truth, whilst creating the data marts and cubes? After all, it is at this stage that we establish the business definitions in the Data Vault Architecture and it is possible that we may need as many as ten different aggregations for one specific indicator.
Barely manageable data silos
Generating all these from within the Data Vault, may lead to a situation that could easily degenerate into an indistinct, barely manageable jumble of loose data silos – just like old times in the pre-data warehouse era. In short: it is true that a Data Vault offers a flexible repository for all corporate data, but its usefulness and advantages appear to be limited. Besides this, the fact that no enforced data-integration takes place is quite a drawback.
A selection of our customers
4 years later, I’m wondering if your outlook on Data Vault has changed.
What are your thoughts on Data Vault now? Given so many advances recently, Delta Lake for example, is there a need to do up front modelling such as DV?