Big Data seems to become Big Business. Why? The amount of data is growing over time exponentially. The percentage unstructured data is getting bigger every year. What is Big Data and how can companies benefit from it? There is an adagium saying ‘Turn your data into dollars’ pointing out that an organization must do something with their data, in other words make profit from it. If this is the case with regular data, does it make sense for Big Data too?
What is Big data?
Let us first explain what Big Data is. Big Data is often characterized by four V’s: volume, velocity, variety and variability.
- Volume is about the amount of data.
- Velocity is about the speed to which the data is coming to you and is hiding from your field of sight (streaming data like Twitter messages).
- Variety has to do with the different formats the data is in (range of data types).
- And variability is about the different meanings a data element can have (misspellings, synonyms).
As you can see the phrase ‘Big Data’ can be a little bit confusing, because it refers also to non-volume issues. Even if the data volume is relatively small, it is called Big Data when there are issues regarding velocity, variety or variability.
Is it new?
From our point of view we would say yes, definitely. In previous times large databases were set-up to store a lot of structured (customer) data. But today, the amount of data is so huge and the diversity so immense, that some data can’t be stored in a regular relational database anymore. And we have learned, if things can’t be stored in a database, we cannot analyze the data in a fashionable manner. Therefore, we need special methods, Business Intelligence tools, software and sometimes special hardware appliances in order to be able to analyze Big Data sources in real-time or near real-time.
How to benefit from it?
The promise of Big Data is that the ‘big data’ is so huge that it hides very valuable information about your clients behavior or what they are thinking. It can unravel important moves of your competitors which are not yet made public and it may contain significant trends about the market you are operating in. If you are able to analyze the huge amount of (streaming) data better and faster you will get deeper insights you didn’t had yesterday which can put you ahead of your competitors.
What are typical Big Data sources?
Big Data can be extracted from the following data sources (some examples):
- social media like Twitter and FaceBook;
- sensors in the human body or other organism;
- sensors beneath the surface of the earth measuring for example seismic activity;
- sensors in space or sensors measuring events in space from earth;
- RFID tag sensors measuring product movements;
- logs containing surfing behavior of your website visitors.
- sensors in machines, clothes or devices measuring for example the condition of the device or the temperature;
Often you need special adapters (API’s) to be able to extract the data and do a decent data integration job.
What Big Data ‘solutions’ are available?
This is the most difficult question to answer because it depends on a variety of things. You can’t really speak about solutions when it comes to Big Data, because the specific challenge can be different for each branch or company, and may depend on what type of application you want to build and from which data source(s) you need to extract. Allthough, there are tens of technologies which can be helpful to exploit Big Data. These technologies can be classified as follows:
- Software for data extraction like the Twitter API. Often these adaptors are available as an add-on for ETL software;
- Software for data storage like the Hadoop Distributed File System (HDFS), but there are many alternatives;
- Techniques for data analysis & classification like machine learning, natural language processing, neural networks, pattern recognition, predictive modeling;
- Software for data analysis like MapReduce (part of Hadoop), however many alternatives are available;
- Software for data visualization like Tableau Software and IBM OpenDX;
- Hardware (appliances) for parallel processing or Cloud computing platforms.
Depending on your type of problem you may need a specific mix of above technologies.
A few Business Intelligence software solutions are able to read directly from social media API’s, HDFS and some uses specific software for data analysis beneath the surface of their own software. Download here the Business Intelligence Tools Survey for more information.