Four years ago, when we published the second edition of this survey, we saw that not many ETL tools had good, reliable functionality for real-time application integration (EAI) projects. Since then, many ETL tools have added tools for real-time extraction, transformation and integration, and there has been an almost complete convergence between ETL and EAI tools into a new market which is being called Data Integration.
Not everyone gets excited about the prospect of discussing information entropy, shadow IT, and technical debt. But for Martijn Evers, it's all in a day's work. We had an animated discussion about holistic data management and the art of taming bulls. Together with Ronald Damhof, Martijn Evers, co-founder of i-Refact, started an online movement dedicated to perhaps the ultimate job of the future: full-scale data architect. It's a job that has to suit you. "Usually, you're born a data architect", the self-appointed data missionary says. In other words: abstract thinking has to be in your genes. That's why organizations usually call on people with real passion to fill this key role.
In practice, Martijn Evers, co-founder of i-Refact, believes there's a desire for data architects with a holistic vision (read part 1 of our interview here). Architects that can effortlessly switch between various modes. He jokingly refers to the contrast between a gorilla architect, who is assertive and supported by the direction, and a guerrilla architect, who doesn't have a wide base of support in the organization due to all kinds of politically sensitive matters, and thus is forced to operate under the radar.
If the challenges associated with Big Data are handled well by making things relevant, digestible, and specific, you can go from Big Data to Right Data and then make the Right Decisions. Then, you'll be able to make the four Vs of Big Data work to your advantage: you'll see more and see better as an organization! Volume: see more. Velocity: spot things faster. Variety: see more detail and more nuance.
Do you want to stay ahead of the competition and give your customers the best possible experience? Become a data-driven organization. Instead of making decisions based on opinions, gut feeling, who yells the loudest, or because "that's just the way we've always done it," your organization will take action based on data and facts. A study by the MIT Center for Digital Business shows that organizations that make data-driven decisions have 4% higher productivity while earning 6% more profit. That may not seem like a lot to some, but over $1 million in profits, we're already talking about $60,000. That amount of money can get you a pretty decent BI system. Other studies have shown that successfully using BI can lead to 33% more satisfied customers. And with IoT applications, we can save $63 billion worldwide in healthcare. You can find many advantages of data-driven working in all kinds of areas. That's why you should make data an integral part of your company's DNA now, in just five steps.
The latest edition of the ETL Tools Survey is out: a good reason to chat with Rick van der Linden, Business Intelligence and ETL expert, about the latest trends in ETL. "ETL stands for extract, transform, and load. ETL can be compared to purifying water. First, undrinkable water (data) is extracted from various rivers. This dirty water is purified using a tool. The purified water is stored in a container that you can drink from," says Rick van Der Linden.
In this technology-driven world, we like to believe in progress. It can be easy to forget that “new” isn't always better. A prime example of this is the “data lake” phenomenon: a figurative lake of data, which became a hype in Business Intelligence (BI) and Big Data circles. Many companies rushed to jump on the bandwagon and built their own data lake. But was this such a good idea, or would it have been better to think about what can be accomplished using a data lake first?
During our interviews with the different ETL vendors at the beginning of 2018 we have seen a number of general trends and topics emerge. Especially those vendors that operate on the cutting edge are very eloquent on where they see ETL and Data Integration going. Apart from the “hard”, technical developments, we can see that a small number of vendors have a keen eye for the “soft” side of ETL and Data Integration, being data governance and the role of human intervention in maintaining data quality. This was perhaps the most surprising aspect of our survey. We believe that this is also where vendors can truly distinguish themselves.
The return on investments in CRM, Business Intelligence, Analytics, performance management or data integration hinges on your data quality. How do you increase the data quality of regular data, but also that of Big Data, and what tools, methods and measures work? It is a known fact: garbage in, garbage out. Despite the broad recognition of this idea, it is difficult for most organisations to continuously improve the quality of the data. This not only has to do with inadequate IT. The attitude and behavior of your employees and managers also play an important role in this. The specialists from Passionned Group would love to assist you with the improvement of your data quality.
Controllers like to keep things manageable and help organizations with keeping KPIs in order. This is understandable and good from a risk management perspective, but modern controllers must now know everything about Big Data. This is because established KPIs and manageability have to clear the way for Big Data and innovation. Organizations that lose sight of this transition risking completely missing the boat.
The data-driven approach either relies on the data that originate from the information systems that support business operations or goes in search for available external sources that can fulfill the information needs. This approach operates as follows: Inventory of resources: the organization makes an inventory of applications and information systems that register and administer the organization’s data. These may include an ERP system, a tool for financial administration, the CRM system, a complaint system or a combination of those. If relevant, and not to forget affordable, the organization might also look for external data such as market information (supplied by a market research agency), demographic data, Land Registry (Cadastre), the patent register, a registry of brand names, weather forecasts, press bureaus such as AP and Reuter, data from CPB (Economic Policy Analysis) or the Chamber of commerce and finally websites of competitors.
The Passionned Group ETL Tools & Data Integration Survey has existed since 2003, and is a 100% supplier-independent market comparison and analysis report. Hundreds of organizations use the report worldwide to make the best choice for an ETL tool or data integration solution quickly. The December 2014 edition was recently published. “In fact, it's not merely an update, all the parts have been completely revised”, said Passionned Group Editor, Rick van der Linden.
The difference between what ETL suppliers think on the one hand and users want on the other hand, seems to get bigger every year. It starts with the name. Users still talk about ETL (Extract, Transform and Load); while suppliers think it is passé. They talk about data integration and master data management, something that is wider than the original ETL, but describes the same problem. If we assume that Google statistics give an indication of what people are interested in, we see that globally 240,000 people per month (in the Netherlands: 2400) search for “ETL”. The search term “data integration” is “only” entered by 74,000 people (in the Netherlands: 720). On the first Google page of “data integration”, we find almost exclusively supplier information. The first “ETL” page contains a lot of (neutral) information on the subject itself and is almost free of suppliers. Good positioning of the products remains a problem, as is clear from the above example.
Data warehouses typically contain quite a lot of content. Let us assume, for convenience sake, that much of this content is relevant and reliable. This is often not (yet) the case, but fine, we want to address another issue here, namely that the user cannot find information even though it is available in the right quality. I myself have once examined what the reason for this could be and have compiled a preliminary list:
Recently I was invited to a management team (MT) meeting of a financial services company. They asked me to lay down a clear vision about a specific Business Intelligence issue. In addition to the CEO and several managers, the CFO of the company was also present. He bemoaned the fact that the figures from the data warehouse never corresponded to those in his accounting system. He wanted one version of the truth, which is a praiseworthy thing in itself.
Many companies are looking for ideas to increase employee productivity and reduce the cost of business operations to increase their competitiveness. As information in society becomes increasingly and more digitized, the moment has come to get more of a grip on the huge mountain of data, and to implement smart solutions for managing this data from the cradle to the grave. It is a business problem that sometimes even gets the better of the IT department and needs attention from top management to set the course.
Financial firms struggle with the legacy of their separate siloed risk systems. A recent survey funded by BI vendor SAS shows that 81% of the respondents experiences the technical hurdle of data inconsistencies. The difficulties of separate risk systems and inflexible sources were pinpointed by more half of them. The survey questioned 27 global financial institutions about the principles of BCBS 239: Basel Committee’s Principles for Effective Risk Data Aggregation and Risk Reporting that aims to increase transparency and reduce operational risks. Banks that adhere to these principles can improve future stress testing and anticipate future problems.
SAS, supplier of business analytics software and services and a vendor in the business intelligence market, has updated its Master Data Management and Federation Server software. The software is designed to help organizations better access data, manage data and use data from any source. The ability to organize and manage the data pouring in from an ever-growing list of sources is critical, according to SAS.
Contrary to many studies that evaluate ETL tools based only on their current market share and functionality, our study (the ETL Tools & Data Integration Survey) also attempts to evaluate ETL tools based on their expected future performance. With the market changing as quickly as it is today, market share is about the past, and says little about the future. Many products, including WordPerfect, Lotus 123, and Harvard Graphics once had a huge market share, but proved to have very little future potential. The growth potential as defined within the confines of this ETL study looks at how frequently a vendor releases new, valuable features; an important indicator for innovation and the strategic importance of the product to the vendor.
The ETL tools were evaluated in our ETL Tools & Data Integration Survey based on the following characteristics: Functionality Clarity and re-usability Debugging Real-time ETL/EAI/Web services ETL functionality Data sources/targets Architecture and infrastructure Ease-of-use Growth potential and Market Stability Pricing Connectivity Platform support The ease-of-use criterion was the only aspect that involved a measure of subjectivity. We looked at the products from the point of view that they should be usable by a broad range of professionals and not just a few specialized IT professionals. This is clearly not everybody’s view of the world; some of the ETL suppliers we talked to were appalled, suggesting that allowing business users the chance to build data warehouses would guarantee failure. Others agreed that their major market was the business user and not the IT professional. We have made every effort to judge the products objectively in terms of ease-of-use, but admit to a bias in the direction of broad use.
Since 2003, we have been closely monitoring the market for ETL and data integration tools. In the past, the focus was on the market leaders who were often seen as visionaries and leaders.Many organizations used to assume that they had automatically made the right choice if they purchased a tool from one of the market leaders. Since the late nineties, however, the market has changed substantially. Practically all the Business Intelligence (BI) vendors have purchased or developed their own ETL tools. Since a centralized data warehouse is one of the cornerstones of a successful BI solution, this has turned out to be a wise choice. Market estimates show that 70-80% of the costs of a successful BI system relate to the creation of reliable ETL processes and data integration.
On the path towards maturity in the field of metadata, an organization can go through four levels of ambition. These are shown in the figure below. Figure: The four levels of ambition of metadata Level 0: an organization is on this level if there is hardly any need for metadata, for example because all applications store data in one and the same database from which they also retrieve the data. The applications are more or less isolated, which means that hardly any data is exchanged between them. The organization is structured as such and managers primarily focus on their own departments.
A Data Vault is a modeling technique for the CDW, designed by Dan Linstedt, which chooses to store all incoming transactions regardless of whether the details are in fact trustworthy and correct: “100% of the data 100% of the time”. For example: a sales transaction has already taken place, but the corresponding customer does not yet exist in the CRM system. The sales transaction can nonetheless be stored in the CRM system. When the customer becomes known to the system, the transaction changes from a ‘meaningless’ fact into a useful ‘truth’ because now its context is known.
For a proper Business Intelligence system we need to have some sort of ETL in place. When extracting and filtering data from the source systems the following aspects are important: Indicators and other types of management information need data from tables in the source system. If this requires one or more attributes, it is highly recommended to copy ALL attributes (from a table) to the SA. Why? Well, firstly it is simpler: instead of having to name each attribute explicitly, the table name will do*. Secondly, we can produce indicators or dimensions – which should be based on data in tables that are already in the SA – faster. After all, there is no need to first adjust the extraction. The disadvantage is that more data need to be processed, which may be a burden on the loading time of the SA.
A pharmaceutical wholesaler wanted to find out more about its market performance compared with the performance of other wholesalers in the industry. The data required to calculate the market shares were only partially available in the wholesaler’s source system, namely the order files. In order to achieve exchange of sales data - anonymous of course - between wholesalers in the same line of business a negotiation took place at board level. As a result, the wholesalers established a joint foundation with the aim to once a month process all sales data - according to a specific format - into one large file containing all revenue data divided by product level and month level.
More and more organizations are wondering what the use is of a data warehouse, and whether or not it's worth the investment. They also want to know what alternatives are available. A growing number of IT vendors and some "experts" claim that the end of the data warehouse is nigh. When we say vendors, we're referring to suppliers of data warehouse appliances, data virtualization tools, and data discovery tools. We have a different opinion, though. The data warehouse still is the beating heart of the Intelligent organization and it serves different vital goals.
Business Intelligence only works well when we regularly retrieve data from the source systems and copy it to a separate computer and database. This means that the data from the source system are stored redundantly: in the source system and in the data warehouse. A traditionally minded IT specialist will find this unacceptable: data should – within the company network – never be stored more than once so that when we change data we will not need to do this at several places. The fact that this principle benefits the maintainability of data is beyond dispute. Especially when we need to analyze large volumes of (unstructured) data: Big Data.
There's been a lot of discussion about BI in difficult times; to what extent is it rewarding to invest in BI, and why do so many projects appear to be unsuccessful? When we talk about Business Intelligence, especially for big companies, we talk about processing data (from databases and applications) into actionable information. This is the traditional Business Intelligence that companies like Cognos and Business Objects have been working on for years, and that Gartner publishes an annual BI Quadrant about. One of the problems with this type of solution is that a lot of the data needed to get the proper management information isn't contained in a clearly-defined Oracle database with a metadata layer on top of it. It's also not in SAP/R3 with structure and security that we, with some difficulty, can reach. No, it's in web pages, PDF files, Powerpoint presentations, emails, and Word documents, where there are very few standard definitions, and even fewer access rules.
Microsoft Corp. today announced that the latest version of the world’s most widely deployed data platform, Microsoft SQL Server 2012, has released to manufacturing. SQL Server 2012 helps address the challenges of increasing data volumes by rapidly turning data into actionable business insights. Expanding on Microsoft’s commitment to help customers manage any data, regardless of size, both on-premises and in the cloud, the company today also disclosed additional details regarding its plans to release an Apache Hadoop-based service for Windows Azure. [global name="Press release warning"]
Question: Which ETL tools can support Data Vault modeling out-of-the-box? What are the challenges and issues building a data vault with ETL tools? 1. Marcel de Wit - See another discussion on LinkedIn (still active; Dutch) 2. Daan van Beek - Thanks Marcel, I did read the comments of that discussion too, it was the reason I started this discussion actually. After reading it, it was still unclear to me whether ETL tools do support Data Vault out-of-the-box like slowly changing dimensions or not. So, who knows the answer(s)? The vendors?
After users become familiar with your BI project's benefits, they'll likely want more. Be prepared to provide analysis of unstructured data. We'll show you how to start. You've spent the last five years defining, establishing, and building an analytical environment for your organization. You received accolades for finally providing access to structured information from your company's transactional systems through a business intelligence (BI) tool with underlying data marts, a data warehouse, and a data integration tool. Now -- all of a sudden, it seems -- your colleagues are asking for access to other kinds of content such as email, documents, and audio-visual media through your analytical architecture so they can use this content for predictive analytics in the BI application. Where should you start?
Many companies are seeing very significant increases in data volumes and these are having an impact on their data warehouse programmes, according to the latest survey from PMP Research. The research has been commissioned by the Evaluation Centre. Most of the organisations polled (68%) report that data volumes have increased substantially over the past three years, with a further 25% indicating more modest rises. Only 2% reckon that data volumes have stayed constant over that time period.
A'dam - March, 3th 2008 - With several books on data warehousing to his name, and a business that trains IT staff in the field, Ralph Kimball has been named in some quarters the “father of data warehousing” — though he appears to share that title with another data warehousing pioneer, Bill Inmon, if a Google search of the two is anything to go by. He and Inmon have different approaches to the art of data warehousing, but Kimball, who is teaching in New Zealand this month, says that doesn’t make them enemies.
We have recently been investigating why so many data migration projects (84%) run over time or budget. Over half of the respondents in our survey who had run over budget blamed inadequate scoping (that is, they had not properly or fully profiled their data) and more than two thirds of those that had gone over time put their emphasis in the same place. I mention this because it is symptomatic of all data integration and data movement projects: data quality needs to start before you begin your project (so that you can properly budget time and resources) and continues right through the course of the project and, where it is not a one-time project like data migration, is maintained on an on-going basis through production. Further, in order to maintain quality you need to be able to monitor data quality (via dashboards and the like) on an ongoing basis as well. This is especially important within the context of data governance.
"I am tired of IT people. They'll talk to you for half an hour about what they do, but by the time they're finished, you still don't have any idea what they've told you or why it's important," so complained a friend of mine in a recent conversation. What was the irritant giving rise to my friend's frustration? Ahem … my own ineffective attempt to explain to him what it is that I, a data warehousing/business intelligence (DW/Business Intelligence) practitioner, do. Sometimes the right metaphor is helpful. It can clarify abstract concepts for the uninitiated and, even for the expert, be a means for synchronizing designs and vocabularies and analyzing problems. To that end, let me propose a metaphor for describing what data warehousing and Business Intelligence is all about and, perhaps more importantly, suggest where the field is broken: the metaphor is that of an information supply chain.