Four years ago, when we published the second edition of this survey, we saw that not many ETL tools had good, reliable functionality for real-time application integration (EAI) projects. Since then, many ETL tools have added tools for real-time extraction, transformation and integration, and there has been an almost complete convergence between ETL and EAI tools into a new market which is being called Data Integration.
Not everyone gets excited about the prospect of discussing information entropy, shadow IT, and technical debt. But for Martijn Evers, it's all in a day's work. We had an animated discussion about holistic data management and the art of taming bulls. Together with Ronald Damhof, Martijn Evers, co-founder of i-Refact, started an online movement dedicated to perhaps the ultimate job of the future: full-scale data architect. It's a job that has to suit you. "Usually, you're born a data architect", the self-appointed data missionary says. In other words: abstract thinking has to be in your genes. That's why organizations usually call on people with real passion to fill this key role.
In practice, Martijn Evers, co-founder of i-Refact, believes there's a desire for data architects with a holistic vision (read part 1 of our interview here). Architects that can effortlessly switch between various modes. He jokingly refers to the contrast between a gorilla architect, who is assertive and supported by the direction, and a guerrilla architect, who doesn't have a wide base of support in the organization due to all kinds of politically sensitive matters, and thus is forced to operate under the radar.
If the challenges associated with Big Data are handled well by making things relevant, digestible, and specific, you can go from Big Data to Right Data and then make the Right Decisions. Then, you'll be able to make the four Vs of Big Data work to your advantage: you'll see more and see better as an organization! Volume: see more. Velocity: spot things faster. Variety: see more detail and more nuance.
Do you want to stay ahead of the competition and give your customers the best possible experience? Become a data-driven organization. Instead of making decisions based on opinions, gut feeling, who yells the loudest, or because "that's just the way we've always done it," your organization will take action based on data and facts. A study by the MIT Center for Digital Business shows that organizations that make data-driven decisions have 4% higher productivity while earning 6% more profit. That may not seem like a lot to some, but over $1 million in profits, we're already talking about $60,000. That amount of money can get you a pretty decent BI system. Other studies have shown that successfully using BI can lead to 33% more satisfied customers. And with IoT applications, we can save $63 billion worldwide in healthcare. You can find many advantages of data-driven working in all kinds of areas. That's why you should make data an integral part of your company's DNA now, in just five steps.
In this technology-driven world, we like to believe in progress. It can be easy to forget that “new” isn't always better. A prime example of this is the “data lake” phenomenon: a figurative lake of data, which became a hype in Business Intelligence (BI) and Big Data circles. Many companies rushed to jump on the bandwagon and built their own data lake. But was this such a good idea, or would it have been better to think about what can be accomplished using a data lake first?
During our interviews with the different ETL vendors at the beginning of 2018 we have seen a number of general trends and topics emerge. Especially those vendors that operate on the cutting edge are very eloquent on where they see ETL and Data Integration going. Apart from the “hard”, technical developments, we can see that a small number of vendors have a keen eye for the “soft” side of ETL and Data Integration, being data governance and the role of human intervention in maintaining data quality. This was perhaps the most surprising aspect of our survey. We believe that this is also where vendors can truly distinguish themselves.
The return on investments in CRM, Business Intelligence, Analytics, performance management or data integration hinges on your data quality. How do you increase the data quality of regular data, but also that of Big Data, and what tools, methods and measures work? It is a known fact: garbage in, garbage out. Despite the broad recognition of this idea, it is difficult for most organisations to continuously improve the quality of the data. This not only has to do with inadequate IT. The attitude and behavior of your employees and managers also play an important role in this. The specialists from Passionned Group would love to assist you with the improvement of your data quality.
The data-driven approach either relies on the data that originate from the information systems that support business operations or goes in search for available external sources that can fulfill the information needs. This approach operates as follows: Inventory of resources: the organization makes an inventory of applications and information systems that register and administer the organization’s data. These may include an ERP system, a tool for financial administration, the CRM system, a complaint system or a combination of those. If relevant, and not to forget affordable, the organization might also look for external data such as market information (supplied by a market research agency), demographic data, Land Registry (Cadastre), the patent register, a registry of brand names, weather forecasts, press bureaus such as AP and Reuter, data from CPB (Economic Policy Analysis) or the Chamber of commerce and finally websites of competitors.
The difference between what ETL suppliers think on the one hand and users want on the other hand, seems to get bigger every year. It starts with the name. Users still talk about ETL (Extract, Transform and Load); while suppliers think it is passé. They talk about data integration and master data management, something that is wider than the original ETL, but describes the same problem. If we assume that Google statistics give an indication of what people are interested in, we see that globally 240,000 people per month (in the Netherlands: 2400) search for “ETL”. The search term “data integration” is “only” entered by 74,000 people (in the Netherlands: 720). On the first Google page of “data integration”, we find almost exclusively supplier information. The first “ETL” page contains a lot of (neutral) information on the subject itself and is almost free of suppliers. Good positioning of the products remains a problem, as is clear from the above example.
Data warehouses typically contain quite a lot of content. Let us assume, for convenience sake, that much of this content is relevant and reliable. This is often not (yet) the case, but fine, we want to address another issue here, namely that the user cannot find information even though it is available in the right quality. I myself have once examined what the reason for this could be and have compiled a preliminary list:
Recently I was invited to a management team (MT) meeting of a financial services company. They asked me to lay down a clear vision about a specific Business Intelligence issue. In addition to the CEO and several managers, the CFO of the company was also present. He bemoaned the fact that the figures from the data warehouse never corresponded to those in his accounting system. He wanted one version of the truth, which is a praiseworthy thing in itself.
Many companies are looking for ideas to increase employee productivity and reduce the cost of business operations to increase their competitiveness. As information in society becomes increasingly and more digitized, the moment has come to get more of a grip on the huge mountain of data, and to implement smart solutions for managing this data from the cradle to the grave. It is a business problem that sometimes even gets the better of the IT department and needs attention from top management to set the course.
Contrary to many studies that evaluate ETL tools based only on their current market share and functionality, our study (the ETL Tools & Data Integration Survey) also attempts to evaluate ETL tools based on their expected future performance. With the market changing as quickly as it is today, market share is about the past, and says little about the future. Many products, including WordPerfect, Lotus 123, and Harvard Graphics once had a huge market share, but proved to have very little future potential. The growth potential as defined within the confines of this ETL study looks at how frequently a vendor releases new, valuable features; an important indicator for innovation and the strategic importance of the product to the vendor.
On the path towards maturity in the field of metadata, an organization can go through four levels of ambition. These are shown in the figure below. Figure: The four levels of ambition of metadata Level 0: an organization is on this level if there is hardly any need for metadata, for example because all applications store data in one and the same database from which they also retrieve the data. The applications are more or less isolated, which means that hardly any data is exchanged between them. The organization is structured as such and managers primarily focus on their own departments.
A Data Vault is a modeling technique for the CDW, designed by Dan Linstedt, which chooses to store all incoming transactions regardless of whether the details are in fact trustworthy and correct: “100% of the data 100% of the time”. For example: a sales transaction has already taken place, but the corresponding customer does not yet exist in the CRM system. The sales transaction can nonetheless be stored in the CRM system. When the customer becomes known to the system, the transaction changes from a ‘meaningless’ fact into a useful ‘truth’ because now its context is known.
For a proper Business Intelligence system we need to have some sort of ETL in place. When extracting and filtering data from the source systems the following aspects are important: Indicators and other types of management information need data from tables in the source system. If this requires one or more attributes, it is highly recommended to copy ALL attributes (from a table) to the SA. Why? Well, firstly it is simpler: instead of having to name each attribute explicitly, the table name will do*. Secondly, we can produce indicators or dimensions – which should be based on data in tables that are already in the SA – faster. After all, there is no need to first adjust the extraction. The disadvantage is that more data need to be processed, which may be a burden on the loading time of the SA.
A pharmaceutical wholesaler wanted to find out more about its market performance compared with the performance of other wholesalers in the industry. The data required to calculate the market shares were only partially available in the wholesaler’s source system, namely the order files. In order to achieve exchange of sales data - anonymous of course - between wholesalers in the same line of business a negotiation took place at board level. As a result, the wholesalers established a joint foundation with the aim to once a month process all sales data - according to a specific format - into one large file containing all revenue data divided by product level and month level.
Question: Which ETL tools can support Data Vault modeling out-of-the-box? What are the challenges and issues building a data vault with ETL tools? 1. Marcel de Wit - See another discussion on LinkedIn (still active; Dutch) 2. Daan van Beek - Thanks Marcel, I did read the comments of that discussion too, it was the reason I started this discussion actually. After reading it, it was still unclear to me whether ETL tools do support Data Vault out-of-the-box like slowly changing dimensions or not. So, who knows the answer(s)? The vendors?
For the last three months of last year Passionned Group have run a poll on their ETLtool.com site asking visitors what they thought were the most important requirements when choosing an ETL tool. The ETLtool.com site has been running since 2005 and advises visitors on the various ETL (data integration) tools available, what their strong and weak points are, how to choose an ETL tool and sells a 100 page report where popular products are analyzed and compared to facilitate choosing the right product for individual circumstances.