Who is the real father of data warehousing?
A’dam – March, 3th 2008 – With several books on data warehousing to his name, and a business that trains IT staff in the field, Ralph Kimball has been named in some quarters the “father of data warehousing” — though he appears to share that title with another data warehousing pioneer, Bill Inmon, if a Google search of the two is anything to go by.
Different approaches of data integration
He and Inmon have different approaches to the art of data warehousing, but Kimball, who is teaching in New Zealand this month, says that doesn’t make them enemies.
“I have a lot of respect for Bill, but he’s not a detail guy,” he says.
“I don’t debate head-to-head with him, because we don’t move on common ground.”
Inmon is “in favor of a very IT-oriented, centralized approach, where the IT department is in total control of data — they get the data, control and process it, and release it to users.”
Kimball’s approach is that users, primarily senior executives, control the data.
“The business world is the owner of the data warehouse.”
Senior management rather than IT
The idea that senior management, rather than the IT department, owns the data led to the coining of the term “business intelligence” (BI), Kimball says.
“On a general level, Business Intelligence has superseded data warehousing in some ways — data warehousing refers to the storage of any data that’s used by the business, whereas Business Intelligence is the effective application of user interfaces and tools involving data [to help] the business make decisions.
“I love the term ‘Business Intelligence’, because it really reminds us of where the initiative is.”
As some have pointed out it is perhaps more accurate to say Kimball is the father of Business Intelligence and Inmon the father of data warehousing.
“Business Intelligence is the inheritor of the legacy that’s been built up by data warehousing,” he says.
The development of decision support systems
That legacy began being built in the 1970s, Kimball says, with the development of decision support systems.
“That term refers to bringing data to technology decision-makers and analysts.
“In the late 1970s and early 1980s, there were a lot of systems built that look like data warehouses today.
“In 1984, I was at Metaphor Computer Systems, selling what would now be called data warehouses to banks, government departments etcetera.
“The term ‘data warehouse’ was coined by Bill Inman around 1990.”
The basic idea of a DWH
The basic idea of a data warehouse — a stable library of information compiled from transactional systems into a second copy used for analysis and not subject to being over-written, as transactional data is — has remained the same since the precursors of data warehouses were developed, Kimball says.
However, the IT industry has changed a lot since then, and the environment around data warehouses has changed significantly, he says.
“In the early ‘80s, a 3-5MB database was thought to be pretty large. Today, 5-10TB is a serious database.”
The increased capacity of databases has been matched by an incredible increase in the amount of data created.
“It’s been growing at an unbelievable rate — I wish it would stop growing so we [data warehousing staff] could do a good job with it, but it won’t.
“It’s growing all the time and in so many different ways.”
As well as the growth in data, other developments in the IT industry, such as the increasing use of service-oriented architecture and Web 2.0 technologies, mean data warehousing will be different in the future, Kimball says.
The key in dealing with new technologies is to look at how they fit into data warehousing, and if they don’t, then don’t get data warehousing staff to implement them, he says.
While SOA and Web 2.0 can add to the capability of data warehousing, imposing projects related to SOA and Web 2.0 on data warehousing teams is the wrong approach, he says.
“If a data warehousing team can use SOA, great,” he says.
“But when people say, ‘the data warehousing team should be the leader of [an SOA project]’, I say ‘no, let someone else do that’”.
Similarly, if a Web 2.0 application has relevance to data warehousing, “then the data warehousing team should say ‘let’s use it’”, but shouldn’t be expected to take a major part in implementing the application, he says.
Prone to taking on too much
“The data warehouse is a huge responsibility, and I’m protective of my students because they’re prone to taking on too much.”
When he talks about his students, Kimball means the many who have taken the data warehousing courses his company, Ralph Kimball and Associates, runs.
While the size of databases, amount of data and other factors are vastly different today compared to when data warehousing began, there are three timeless themes when it comes to building a data warehouse, he says.
Poorly structured data
“First, there’s data quality. People have to understand that if you have poorly structured data, you have a problem.”
Second is data integration.
“At large organizations, there are there are large numbers of customer-facing processes that are collecting data, and integrating them can be a problem.”
Third is the customer experience. For users, that “it boils down to performance and simplicity”.
Knowing your users and what they want is crucial, he says.