What are the most important ETL tool requirements?
For the last three months of last year Passionned Group have run a poll on their ETLtool.com site asking visitors what they thought were the most important requirements when choosing an ETL tool. The ETLtool.com site has been running since 2005 and advises visitors on the various ETL (data integration) tools available, what their strong and weak points are, how to choose an ETL tool and sells a 100 page report where popular products are analyzed and compared to facilitate choosing the right product for individual circumstances.
Important characteristics of ETL tools
In the period that the poll was running approximately 12,000 people visited the site and a little more than 2,000 of them voted on what they thought was the most important characteristic of ETL tools. What we don’t know is the function of the people who voted, we do however know where they came from, about 50% from the USA, 30% from Europe and the rest spread around the world with a significant number from India.
The 5 options they could vote for
We asked people to choose the most important of five options: Excellent performance, Data quality and profiling, Very user friendly, High connectivity, Re-usability and debugging facilities And the results are shown below:
An amazing 50% voted that performance was the most important characteristic of an ETL tool. Whilst we understand that companies are moving more and more data into warehouses we have come across very few companies who have performance problems, and even less where the performance problems could be attributed to the ETL tool. Clearly there is a perception that ETL tools are not fast enough and that this is a problem, we believe that this is not actually the case and that the performance problems that do exist are more likely to be caused by the structure of the target database (the data warehouse) than by any inefficiencies in the ETL tool .
Only 19% consider Data quality and profiling the most important characteristic, that’s a little less than one in five. In general ETL tools are used to load data warehouses and data warehouses are used as the source of the Business Intelligence software that we use to improve our organizations. One of the most important reasons that Business Intelligence is not successful is that the information is not considered to be reliable – and that generally comes from a quality problem in the source data. ICT is a world full of acronyms but one of the most important of them all, something we teach every student in the first week is GIGO – Garbage In Garbage Out.
Conclusions –the perceptions of the marketplace may be different from the perceptions of the consultants and the ETL vendors. Whilst we accept that 2,000 is a very small sample it gives us food for thought, the customers appear not to have the same priorities as the vendors and the consultants – and that can’t be good.