A decade ago the vast majority of data warehouse systems were hand crafted, but the market for ETL software has steadily grown and the majority of practitioners now use ETL tools in place of hand-coded systems. Does it make sense to hand-code (SQL) a data warehouse today, or is an ETL tool a better choice?
What are the 7 biggest benefits of using an ETL tool?
We now generally recommend using an ETL tool, but a custom-built approach can still make sense, especially when it is model-driven. This publication summarizes the seven biggest benefits of ETL tools and offers guidance on making the right choice for your situation.
The single greatest advantage of an ETL tool is that it provides a visual flow of the system’s logic (if the tool is flow based). Each ETL tool presents these flows differently, but even the least-appealing of these ETL tools compare favorably to custom systems consisting of plain SQL, stored procedures, system scripts, and perhaps a handful of other technologies.
Structured system design
ETL tools are designed for the specific problem of data integration: populating a data warehouse or integrating data from multiple sources, or even just moving the data. With maintainability and extensibility in mind, they provide, in many cases, a metadata-driven structure to the developers. This is a particularly big advantage for teams building their first data warehouse.
Many of the home-grown data warehouses we evaluated are rather fragile: they have many emergent operational problems. ETL tools provide functionality and standards for operating and monitoring the system in production. It’s certainly possible to design and build a well-instrumented, hand-coded ETL application. Nonetheless, it’s easier for a data warehouse / business intelligence team to build on the features of an ETL tool to build a resilient ETL system.
Data-lineage and impact analysis
We would like to be able to right-click on a number in a report and see exactly how it was calculated, where the data was stored in the data warehouse, how it was transformed, when the data was most recently refreshed, and from what source system(s) the numbers were extracted. Impact analysis is the flip side of lineage: we’d like to look at a table or column in the source system and know which ETL procedures, tables, cubes, and user reports might be affected if a structural change is needed. In the absence of ETL standards that hand-coded systems could conform to, we must rely on ETL vendors to supply this functionality — though, unfortunately, just half of them have so far (more results in our survey).
Advanced data profiling and cleansing
Most data warehouses are structurally complex, with many data sources and targets. At the same time, requirements for transformation are often fairly simple, consisting primarily of lookups and substitutions. If you have a complex transformation requirement, for example if you need to de-duplicate your customer list, you should buy on additional module on top of the ETL solution (data profiling / data cleansing). At the very least, ETL tools provide a richer set of cleansing functions than those available in SQL. Download the ETL Tools& Data Integration Survey to see how the ETL tools compare on these aspects.
You might be surprised that performance is listed as one of the last under the advantages of the ETL tools. It’s possible to build a high-performance data warehouse whether you use an ETL tool or not. It’s also possible to build an absolute dog of a data warehouse whether you use an ETL tool or not. We’ve never been able to test whether an excellent hand-coded data warehouse outperforms an excellent tool-based data warehouse; we believe the answer is that it’s situational. But the structure imposed by an ETL platform makes it easier for a (novice) ETL developer to build a high-quality system. Furthermore, many ETL tools provide performance enhancing technologies, such as Massively Parallel Processing, Symmetric Multi-Processing, and Cluster Awareness.
A lot of ETL tools are now capable of combining structured data with unstructured data in one mapping. In addition they can handle very large amounts of data that don’t necessarily have to be stored in data warehouses. Nowadays, Hadoop-connectors, or similar interfaces to big data sources, are provided by almost 40% of the ETL tools. And the support for Big Data is growing continually.
Download the ETL Tools & Data Integration Survey: 22 tools
Get all information to select the best (enterprise) ETL tooling for the best price by downloading our ETL tools & Data Integration Survey 2018. You will get real insight into using ETL tools to build successful ETL applications. You’ll receive the results of comparing all the significant ETL tools / Data Integration solutions across more than 90 criteria.