OLAP: faster analyzing data
OLAP is an abbreviation for Online Analytical Processing and gives users the ability to analyze information thoroughly from multiple perspectives easily. OLAP is often provided by using information cubes but some Business Intelligence tools have OLAP functionality in their reports.
The Power of OLAP: boost performance
The power of Online Analytical Processing comes from the speed and flexibility compared with normal reporting directly from the database. It is fast because most of the user requests are pre-processed and the (aggregated) answers are stored in an OLAP-cube. The OLAP engine provides the answer almost instantly. It is flexible because users are able to choose easily between different dimensions, a filter or drill-down to lower levels of detail.
Differences: OLAP vs OLTP
OLAP systems are often compared with OLTP systems. OLTP is an abbreviation for Online Transaction Processing. The sole purpose of OLTP is to process transactions as fast as possible. What are the differences between OLAP and OLTP? The following table reveals the most important differences:
|Purpose||transaction processing||management information|
|Level of detail / scope||single records||aggregated datasets|
|Data model||normalized (3NF)||denormalized (star schema)|
|Data volumes per query||very small||large|
|Permissions||create, retrieve, update, delete||retrieve|
The 7 Steps to building an OLAP cube
- Decide which data you need for your analysis and write a SQL-query to retrieve the data. Often you need to combine the data from different tables. Then you have to join these tables.
- Define the measures / indicators you want to display in your report, analysis or dashboard
- Define the dimensions you want to use as perspectives for your measures, for example Time, Product, Region and Employee
- Model one or multiple hierarchies in your dimensions, for example Year, Month and Day, or Product Line, Product Group and Product
- Define the aggregates and how they should be processed, and define the complex measures and analytical functionality
- Process the OLAP cube, in other words retrieve the data and build the aggregates
- Optimize the performance of the cube based on usage. If users ask often for a specific combination of measures and dimensions which are not yet stored in an aggregate, define an extra aggregate for that combination.
Disadvantages of OLAP
This great technology comes not only with benefits. Two major problem areas can be identified:
- Inflexibility: once a cube is designed it is often difficult to add measures or dimensions quickly, especially when there is a big gap between IT and business.
- Sparsity: this happens when the cube contains too many dimensions / members. All the possible combinations are stored, but not all make sense. As a consequence the cube explodes and response times are getting even worse.
Alternatives to OLAP
Some people believe that OLAP technology is dead. We don’t think so because not many organization can afford an alternative. Aside from that, the alternatives have important shortcomings. The alternatives for Online Analytical Processing are:
- In-memory BI: all the detailed data is loaded into the internal memory of the computer. There is no need to retrieve data from (slow) disks anymore. This will definitely boost performance but still the computer has to process a lot of data because there are no aggregates to read the answers directly from. Some innovative OLAP tools now combine in-memory with OLAP technology.
- DWH appliances: these dedicated BI computers are intelligent and they learn from data structures itself and queries answered in the past and build automatically indexes or aggregates. Next time the same query has to be answered the appliance detects that and retrieves the data simply from the aggregate. The problem is that an appliance acts like a black box and no one is able to design aggregates upfront or to fix problems.
- Database Indexes: this is by far the cheapest solution. Just define the right indexes in the database so queries are optimized to access the data in the joined tables faster. Nonetheless, still the data has to be aggregated on the fly which might take some valuable time when it comes to analyzing the data.