Passionned Group™

The pros and cons of a Data Vault

A Data Vault is a modeling technique for the CDW, designed by Dan Linstedt, which chooses to store all incoming transactions regardless of whether the details are in fact trustworthy and correct: “100% of the data 100% of the time”.

Passionned Group is an expert in the field of data vaults and data warehousing.

to the knowledge base
Open Table of Contents

A modeling technique for central data warehouse

It’s all about transactions

The pros and cons of a Data VaultFor example: a sales transaction has already taken place, but the corresponding customer does not yet exist in the CRM system. The sales transaction can nonetheless be stored in the CRM system. When the customer becomes known to the system, the transaction changes from a ‘meaningless’ fact into a useful ‘truth’ because now its context is known.

Data Vault keeps track of history

The Data Vault keeps a history for each table field and an ingenious construction of hubs, links and satellites ensures enormous flexibility in storing data. The CDW is loaded much faster since different aspects can be processed simultaneously, in parallel. When we use a Data Vault, the CDW does not have a dimensional structure. That stage comes later, namely when we build the data marts or cubes from the Data Vault. Overall, the Data Vault concept provides a different outlook on both modeling and the architecture of Business Intelligence.

What are the real benefits of a Data Vault?

Question is: what are the real benefits. Moreover, does the Data Vault have any disadvantages? Most noticeable is that the Data Vault distinguishes between facts and the truth, which can be useful in order not to lose transactions and is in fact often necessary from the perspective of compliance. However, does it actually make sense to include a transaction in a report (or analysis) if it is not truly honest?

It requires more time

Creating a Data Vault seems to be complex and probably requires more time, particularly because it remains to be seen whether available ETL software solutions will in fact support the standard Data Vault (see the Data Vault discussion). The same applies to translating hubs and satellites into data marts and cubes. It is simply more difficult.

One version of the truth

Another question: how do we ensure that we do not develop more than one version of the truth, whilst creating the data marts and cubes? After all, it is at this stage that we establish the business definitions in the Data Vault Architecture and it is possible that we may need as many as ten different aggregations for one specific indicator.

Barely manageable data silos

Generating all these from within the Data Vault, may lead to a situation that could easily degenerate into an indistinct, barely manageable jumble of loose data silos – just like old times in the pre-data warehouse era. In short: it is true that a Data Vault offers a flexible repository for all corporate data, but its usefulness and advantages appear to be limited. Besides this, the fact that no enforced data-integration takes place is quite a drawback.

Responses

Joel Wittenmyer wrote on 2017-09-22 - 15:09:

Daan,
4 years later, I’m wondering if your outlook on Data Vault has changed.

Downes Simon wrote on 2021-03-07 - 15:03:

What are your thoughts on Data Vault now? Given so many advances recently, Delta Lake for example, is there a need to do up front modelling such as DV?

A selection of our customers

Become a customer with us now

Do you also want to become a customer of ours? We are happy to help you with data-driven working or other things that will make you smarter.

Daan van Beek - Business Partner

DAAN VAN BEEK MSc

Business Partner

Contact me directly

Fact sheet

Number of organizations serviced
___
Number of training courses
___
Number of participants trained
___
Overall customer rating
9.3
Number of consultants & teachers
___
Number of offices
3
Number of years active
18