We use dashboards, reports and interactive analysis to enable us to see general and simple relationships within business operations – for example, more customer visits lead to an increase in cross-selling which leads to better financial results – visible. The smaller, more specific and more complex relationships surface when we use data mining.
“Data mining is the uncovering of hidden (unknown) relationships or segments in large data collections that have a predictive value for a specific part of the business operations”.
Find important previously unknown relationships
When data mining leads to the discovery of important previously unknown relationships (or patterns), this can have significant impact on the organization. It may result in a substantial increase in profit, a reduction of expenses and better service, all in a short time span. In other words: ‘winners predict, losers react’. With data mining, we can find nuggets of gold in the ‘mountain’ of data the organization registers.
Data mining is not easy
However, data mining is not easy, not least because it is wrapped in a cloak of secrecy and its results are hard to enforce. Firstly, we need to find an appropriate application. To put it in terms of gold miners: we bought ten acres of land, but where do we start digging? When we start digging randomly, the chances are we will not strike gold quickly. Thus, we will first need to investigate whether a certain type of flower or a specific soil structure on the surface might in fact determine where we can find the gold.
Determine first a goal
Before we start data mining, we must determine our goal and assume the existence of certain relationships. The ‘miner’ thus largely controls the data mining process. This means that some hidden patterns will never surface, simply because we never even considered they would exist and did not look for them. A farmer who looks for fertile soil will quite probably be able to find it, however, will he ever discover the oil underneath his piece of land that would make him a rich man? Probably not, or if he is lucky he might find it accidently.
Areas which have been successfully applied data mining
When we use data mining, we should try to remove our blinkers and start thinking freely and creatively. As an example, here are a few areas which have been successfully applied by various organizations in the (recent) past:
- fraud detection in order to prevent damage (to customers);
- predicting price developments in order to achieve higher returns;
- identifying risks in order to adjust insurance premiums accordingly;
- customer segmentation for tailored deals;
- analyzing shopping carts to optimize the layout of (online) shops;
- predicting demand patterns to reduce waiting times in the chain
The field of application differs per organization. What matters is that we focus on finding an application area in which we can save costs or increase profits in a relatively short time, using certain knowledge. We refer to this as the target variable.
Find ‘suspicious’ orders
A medical wholesaler may for example search for the characteristics of returned orders (or of customers) that are responsible for returned shipments (target variable), so that ‘suspicious’ orders can be double-checked before they leave the warehouse. A publisher of popular science books may want to figure out what features of publications will be decisive for the success (target variable) of a publication. Once we have established the target variable, we must once more be aware of the fact that, with data mining, we will not discover other things. We are exclusively looking for that one specific relationship or pattern. Once again, data mining does not automatically provide us with all sorts of interesting links and patterns. We have to specify our goals first.
Case: Data mining and knowledge management at a publisher
A large German publisher of management books wishes to increase the return on investment on its publications. The company also wants to be able to assess quickly the potential return on book proposals. They believe that they can achieve important competitive advantages with the use of data mining and knowledge management.
Assessing publications by experience
Through the years, publishers have gained extensive experience when it comes to assessing publications. Most publishers receive a few book proposals a week, from different authors. Due to busy schedules, some manuscripts are not dealt with or assessed.
Start registering as many features as possible
An initial information analysis shows that there are a number of features that may determine whether a publication will be successful, but it is not exactly clear which ones. The publisher does not yet register these characteristics and decides to create a database and to start registering as many features as possible. This concerns characteristics such as subject, number of illustrations, number of pages, chapters, previous publications by the same author, the author’s network and so on .
400 publications turned out to be successful
Once the database is up and running, the data is entered from each new book proposal. Two years later, the publisher has a database that contains over 15.000 book proposals of which 10% have actually been published. Four hundred of these publications turned out to be successful.
Early this century, the publishing house starts a data-mining project during which various data was collected on topics that were ‘popular’ at the time. The project took about four months and produced a model with some interesting correlations that most publishers had not previously thought of, even though – in retrospect – these relationships were actually logical.
The system automatically comes up with an advice
The linkages were then coded and entered into the registration system. Now, when an employee enters a book proposal including its features, the system automatically comes up with the advice to either, not publish the book because of obvious risks, or to publish the book because success is likely or doubtful but still worth the risk. The proposals with positive advice are sent promptly to the publishers, the doubtful cases remain in the system for publishers to look at or to publish later (for lack of better books to publish).
They achieved a ROI of 13%
Thanks to the application of data mining, the publisher not only ensured better intake of book proposals, but also increased the overall return on publications by 13%. Additionally, the organization has become more aware of the fact that sharing this knowledge has great benefits.