Business Rules control us
Business rules more or less control our entire lives. We don’t like to face this particular truth. The exception to the rule is usually very popular, like rules that apply to others, but not us, such as speed limits and parking regulations. Those kinds of ‘business rules’ are agreements that we make as a society to make society work. These rules are usually the consequence of a law that the majority of the population or the government thinks is necessary.
Business rules describe situations
There are also business rules that describe certain phenomena. A simple example of this is that when you pick up a stone and drop it, the stone will fall. These rules abide to the principle:
IF situation A THEN action B, ELSE action C
These kinds of business rules contain a certain inevitability. You can’t escape them. Often these are laws of physics, and as the name implies, there is no room for variation. When human behavior enters the equation, it becomes a different story. Business rules cease being laws, and become more like a description of the most likely event.
People who buy bread in the supermarket are likely to buy something to put on their bread, but not everyone does. Maybe they already have enough peanut butter or cold cuts or what-have-you at home. Likewise, someone who buys peanut butter might not buy bread. Maybe that costumer doesn’t buy bread at all, and they’re using the peanut butter to make a dipping sauce.
Still, these ‘rules’ allow us to follow and control large amounts of transactions. You could check, for example, to what extent a simple rule about customers buying bread and peanut butter together is correct. By applying this simple business rule to a large amount of transactions you can estimate the probability, which can be useful knowledge.
It becomes trickier when there is a large amount of situations (transactions) but no information available about cause and effect. In that case, is there even a business rule? Or maybe there’s a system of rules interacting? Or maybe there’s no business rule governing the events at all, and it’s purely random.
Predicting the future using business rules
This page is about using business rules to predict events in the future. These could be business rules you’ve created in order to test their reliability. They could also be business rules you want to deduce based on a large number of events, the outcome of which is already known. The first step is accurately describing the problem before you start looking at possible solutions.
The problem solved by business rules
Business rules are used to predict the outcome of a certain transaction or event based on easily determined starting values. As discussed earlier, it’s valuable to know what items a customer may purchase when buying bread. There are other situations that this applies to:
- Diagnosing medical conditions: Based on the presence or absence of symptoms, we can determine whether a patient is ill, and what ails them. Obviously a broken leg is easily diagnosed using an x-ray photo. Symptoms like forgetfulness are a lot harder to place, however. An added complication is that some medication introduces certain symptoms as side effects.
- Language analysis: Certain patterns and letter clusters can be found by comparing the same text written in two different languages (or even two different scripts). From these matching groups, further analysis can be done. This is essentially the method used by French linguist Jean-François Champollion to decipher the Rosetta Stone. However, when the matching language is missing, deciphering the code may become more difficult, or even impossible.
In all situations, it’s important to process new information using the right business rules. Then you can make correct predictions about, for example, purchasing behavior, or make a correct diagnosis, or a correct translation. This means that the system has access to a collection of business rules that allow for a certain outcome. Based on the above examples, there are two ways to arrive at these rules:
- Method 1: Translate existing knowledge rules into computer rules. In this situation you use the available information to determine the reliability of the existing knowledge rules.
- Method 2: Use the available information to draft rules. At the same time you want to determine the reliability of these business rules.
Which method you use depends on the situation. When making a medical diagnosis, you’d use existing medical knowledge. Certain combinations of symptoms are already associated with certain diseases. In this case, these existing (medical) rules should be translated into computer rules. Then the reliability and probability of these rules can be determined.
When researching purchasing behavior it makes sense to look for relations between certain products that you can use to deduce computer rules. You can analyze the correctness and accuracy of these rules, but you won’t discover any unexpected combinations, because you’re not looking for them.
When you want to deduce new business rules and their reliability from a large number of (different) events, it’s a different story. The trick is keeping the business rules general enough to be applicable.
Training Course Big Data Analytics
Do you want to learn how to successfully apply Predictive Analytics in your organization? Follow our 3-day Big Data analytics training course and increase your Business Intelligence.
The solutions offered by business rules
Analyzing information using business rules means that both the information and the rules have to adhere to certain conditions, both in their formulation and structure. A business rule structure requires some conditions in order to produce a certain result. In formal notation:
IF (value-1, value-2, … , value-n) THEN action-1
In this rule, values 1 through n form a list of conditions that must be met in order to produce action-1 as a result. Every business rule, and thus every outcome, has its own list of conditions. The set of business rules is such that this is the minimum collection of rules that produce the desired action as a result.
Let’s say that a supermarket uses the following two rules to predict customer purchasing behavior:
IF (bread, butter) THEN milk
IF (bread, butter, cheese) THEN milk
Obviously, the second rule doesn’t produce any new information, but applying it takes more time. Time is crucial when processing large volumes of data, so the second line in the example is removed. The situation changes using the following two rules:
IF (bread, butter) THEN chocolate sprinkles
IF (bread, butter, cheese) THEN milk
Now there are two separate business rules, and removing the second rule will produce different results.
Business rules adhere to a certain syntax (notation), and the information also has to be read in a certain way. At first glance it seems like you can use the same syntax here as for the business rule itself. The problem is that the amount of variables in the information in question is unknown.
Let’s assume we know the purchasing behavior of these three customers:
- Customer 1: Bread, chocolate sprinkles
- Customer 2: Cheese, jam, milk
- Customer 3: Bread, butter, cheese, milk
Without knowing the amount of products each customer purchases, we can’t use this information to create or validate any business rule at all. It’s important to know the amount of purchases per customer. Many of these problems can be solved by using methods like key values and techniques like mapreduce.
Another problem of working with business rules is that continuous number sequences aren’t directly usable. Take, for example, these blood pressure values, and body temperatures, which are measured during many medical examinations. These values can’t be used when working with business rules. The values have to be divided over a maximum of five groups:
lower than 35 Very low | 36 Low | 37-38 Normal | 38-40 High | > 40 Very high
By combining the first and last two groups, you can divide everything into three groups. Purely looking at whether a patient is running a fever or not is also viable. In cases like these it’s important to indicate the chosen type of division. The chosen division also has to be defensible. For example, checking for fever at 37 degrees Celsius is not medically defensible at all.
Existing business rules
Working with existing business rules means working with knowledge acquired earlier. This can be applied for three reasons:
- Testing the viability of existing rules against a large number of measurements with both the conditions and the actions.
- Predicting actions based on known conditions.
- A combination of 1 and 2.
The reason depends greatly on the application. When it comes to predicting purchasing behavior, reason 2 is most important. In these kinds of situations, the business rules have already been determined earlier. When considering a medical diagnosis, then a combination of both rules is important. Not just the diagnosis – the action – is important, but also the reliability of this action. The most important change to working with existing rules is translating existing knowledge into an unambiguous set of rules.
Hay fever versus allergic asthma
A series of rules has to be drafted in order to separate hay fever from allergic asthma. This set of rules will only answer the question of whether the patient has hay fever or not. It will not determine whether the patient has asthma, because this can occur separately as well as in combination with hay fever. Also, one of the conditions (the nose) has two separate values here, a relatively common situation. For determining hay fever, this collection of experience rules produces the following search rules:
IF (((nose = running) OR (nose = stuffed)) AND (sneezing = yes))
THEN (hay fever = yes)
IF (((nose = running) OR (nose = stuffed)) AND (itch = yes))
THEN (hay fever = yes)
IF ((itch = yes) AND (sneezing = yes))
THEN (hay fever = yes)
By formulating the experience rules differently, you can reduce nine experience rules to three business rules. These three search rules can be combined into one single business rule:
IF (((nose = running) OR (nose = stuffed))
AND ((sneezing = yes) OR (itch = yes))
OR ((sneezing = yes) AND (itch = yes))
THEN (hay fever = yes)
By combining the conditions like this, the information can be processed much faster, which becomes more important as the quantity of data increases.
In practice, the amount of experience rules can increase drastically – more than twenty is not a rare occurrence. In these cases, reduction becomes important, for which you can use algorithms like LEM1, LEM2, and AQ. Also, reduction lets you test the internal consistency of the experience better. As the amount of rules increases, the odds of useless or mutually exclusive rules appearing increases too.
New business rules
When there are no good experience rules or you have doubts about the validity of the rules, then all you can do is let the information speak for itself. In that case you should look for patterns in all transactions (stores) or events (other systems). There are various algorithms that can handle this type of research. You can use APRIORI to analyze transactions, while WINEPI can analyze events. In the example below we’ll illustrate the way APRIORI works. Despite some differences, the other algorithms work in basically the same way.
Let’s say a store records the transactions of all of its customers. The data analyst can’t say anything about purchasing behavior until they have analyzed these transactions. Over a certain period, the purchases of these twelve customers have been recorded:
1. Bread, cheese, butter, milk, dish soap
2. Bread, butter, jam, milk, chocolate sprinkles
3. Bread, cheese, butter, milk
4. Bread, jam, water, milk, cake
5. Cake, detergent
6. Butter, milk, cake
7. Bread, butter, jam
8. Bread, milk, chocolate sprinkles
9. Bread, cheese, milk, dish soap
10. Bread, cheese, butter, chocolate
11. Bread, peanut butter, butter, jam, cheese
12. Bread, cheese, butter, detergent
To deduce usable business rules based on these twelve customers, first a threshold has to be implemented for the minimum number of occurrences of a product. Let’s say the threshold is set to a minimum of five appearances. The next step is counting each product to see how often it was purchased. Then this data has to be listed in a table in descending order.
- Bread 9
- Butter 9
- Cheese 6
- Milk 6
- Jam 4
- Cake 3
- Chocolate sprinkles 3
- Dish soap 2
- Detergent 2
- Peanut butter 1
- Water 1
The products under the threshold are left out of the analysis. Why? Because otherwise separate, possibly unique transactions, will have too much influence over the results. Another disadvantage of a large amount of products is having to search through the entire collection per product, which can take a long time in long collections. The next step is looking for all possible combinations of two products above the threshold. That produces the following results.
- Bread, butter 6
- Bread, cheese 6
- Bread, milk 5
- Butter, milk 4
- Butter, cheese 3
- Milk, cheese 3
Here, too, the same threshold of five occurrences is in effect. If one of the products occurs less than five times, that also counts for every combination featuring that product. In the same way we can test the occurrence of groups of three.
- Bread, butter, cheese 5
- Bread, butter, milk 3
- Bread, milk, cheese 3
- Butter, cheese, milk 2
Again, and for the same reason, we’re only considering the four most common products. The combination of all four products (bread, butter, cheese, and milk) occurs only twice in the original list of purchases. Because of that it doesn’t meet the threshold. From the above example you can see that deducing rules based on a large number of transactions is a laborious task. Regardless of the algorithm used, the entire data set has to be read multiple times. Using APRIORI doubles the amount of times the information is read when adding one extra term.
The limitations of business rules
Using derivative rules to search large quantities of data costs a lot of time. No matter how advanced the algorithm is, the data always has to be searched multiple times. That aside, even when using known experience rules, the process is easily affected by the starting values given to the algorithm. When using experience rules (also called supervised rules induction), there’s always a chance that the business rules are inconsistent or incomplete. Business Analytics software can’t always catch this, so it’s difficult to check a large amount of experience rules for errors. Using incorrect search rules returns wrong results that aren’t immediately detectable. The takeaway is that checking that the rules are correct, complete, and consistent, is the most important aspect when using this technique.
There are objections
Given the possible problems using experience rules, the option of deducing rules directly from the information seems like a more reliable choice. But doing so also comes with some objections. Even in this case, the results can be influenced by setting the threshold:
- Threshold too low: Causes too many search terms to be factored into the analysis. The result is a large amount of search parameters. In the earlier example, a threshold of 3 would cause the products jam, cake, and chocolate sprinkles to be factored in, which would make the search take eight times as long. On top of that this would produce such a detailed answer that the probability of useful solutions would become very small.
- Threshold too high: This means you can use very few search terms. The system will only generate a few business rules. This means you can only find the simplest, most obvious solutions. These are usually the solutions you could arrive at by using common sense. In the earlier example, a threshold of 7 would only show the terms bread and butter. From this, you could deduce that someone who buys bread is also likely to buy butter. You can come up with a rule like this without using a Big Data analysis.
There are also methods to test the reliability and validity of rules. These tests are a necessity for any professional application of rule induction. You can test if you chose the correct thresholds, among other things.
Decision tree versus rule deduction
There are similarities between using a decision tree and deducing rules. Both methods employ a series of choices. Yet there is an important difference between both methods. That’s why they can coexist:
- Decision tree: Divides a set of data in two per selection moment. This process is then repeated with every set until the decision tree has reached its end. Also, the tree structure is entirely dependent on how the (human) programmer builds it.
- Business rules: Creates a combination of different choices that you can use simultaneously, possibly in combination. This makes it possible to base a decision on different factors in one rule. Another difference is that the final rules are determined by software, regardless of the way they were created.
In general, a decision tree is used when the division across multiple alternatives is set in stone. Business rules are used when applying existing knowledge to a data set, or when it’s necessary to discover (hidden) patterns in data. This last aspect makes rule induction a real data mining technique. Why? Because the data speaks for itself.