How to approach the Data Science question
Data Science is a rapidly up-and-coming discipline. It’s about finding patterns in large data streams and then analyzing and validating them. Extract valuable information from the vast amount of data we all generate every day. New methods and techniques are being developed in order to remain competitive on all fronts.
- How to tackle the Data Science issue?
- What concrete examples can serve as inspiration?
- Are there already reliable tools on the market that you can use?
- What skills (training & education) are required?
- How to make future-proof Data Science decisions?
Also curious about this new field? Passionned Group’s Data Scientists are happy to assist you in successfully stepping inside this new world!
Important Data Science skills
Hal Varian from Google put it this way:
The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades.
“Excellent. The Passionned Group provided me with an understanding & best practices and approaches that help me feel that a BI strategy and a professional data science team are within reach.”
“The Usual Suspects” as an example
Notable examples include companies like Google, Facebook, LinkedIn, and Netflix who know better than anyone how you can create new, more valuable information derived from internal and external data. They are manufacturers of various kinds of information products that they create from the data that users entrust to them (freely).
Amazon also stores the buyer information from each purchase and knows what is being looked for and combines all these interests into a data profile about yourself. A customer is a “data product” that continuously generates new data. And not just us humans. Objects also generate data. Cars are recorded and measured by cameras and sensors, mail packages are followed worldwide, and cell phones are tracked continuously. The amount of data that objects generate has now become many times larger than the human production. The big challenge for Data Science is to extract value from this data within the laws and regulations.
Data Science tools
What tools are at your disposal in order to obtain new and sometimes unexpected insights? How are quantitative methods like statistics, machine learning, and data mining currently supported?
There are many developments going on in this area in the open source world. There has been effort invested in tools like R, Python and Hadoop, and RapidMiner. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time series analysis, classification, clustering, etc.) and graphical techniques, and is fairly easy to expand because of the object-oriented design.
- Python is also an object-oriented expandable programming language with powerful libraries for data manipulation and analysis.
- Both R and Python are often used in combination with Hadoop (and its MapReduce routines).
- RapidMiner is a ‘pure-play’ platform in which only the core is open source and provides an integrated environment for machine learning, text and data mining, and predictive analytics.
The major BI players
The major Business Intelligence players also try to distinguish themselves in this new world. For example, SAS, IBM, Oracle, Microsoft, and SAP have all put statistical and data mining solutions and extensions on the market. Traditionally, SAS with extensions to its statistical package Enterprise Miner, and IBM with data mining capabilities in SPSS and Watson Analytics, are the players to keep an eye on.
“What we do is almost science fiction.”
But Data Science also delivers breakthrough results closer to home. “Elie van Strien is the commander of the Amsterdam-Amstelland Fire Department. Two years ago, his fire department was chosen as the Smartest Organization in the Netherlands, thanks to a revolutionary BI innovation: Fire Department Intelligence. This year Van Strien has a seat on the jury for the election of the Smartest Organization in the Netherlands. Commander Van Strien feels like a little boy in the BI toy store.” Read more…
Creating value with Data Science
That strand of growing data has value. A lot of value. The smarter you can combine the different strands, the more new and valuable data products or data services you can create. Whether it’s purchasing, sensor data, keywords, smart meters, or sound recordings of telephone calls: customers and machines have become part of the data generation process that we can barely contain. The ‘blending’ of the various internal and external data sources can result in new and unexpected insights.
What makes (or breaks) a good Data Scientist?
First, you naturally expect the “hard skills”. Quantitative skills such as statistics, machine learning, and data mining.
1. Hard skills
Understanding of “the law of really big numbers”: with a sufficient number of examples, strange coincidence is likely to occur. Technical insight with an excellent understanding of (the design of) databases. The attitude of a good programmer and hacker: tenacious, with attention to detail. Being able to write code.
2. Soft skills
Also necessary is the right understanding of the business and the challenges it faces. When selecting the optimal quantitative technique, the data scientist must take into account the specific aspects of the business problem.
3. Selecting analytical models
Typical requirements for analytical models are:
- Capacity for action: to what extent can the analytical model resolve the business problem?
- Performance: what are the statistical performances of the analytical model?
- Interpretative capacity: can the analytical model be easily explained to decision makers?
- Operational efficiency: how much effort is needed to set up, build, evaluate, and monitor the analytical model?
- Regulatory compliance: does the model comply with the regulations?
- Economic costs: what are the costs of establishing, building, and maintaining the model?
Based on a combination of these requirements, the data scientist must be capable of selecting the best analytical technique in order to resolve the business problem.
4. Communication skills
A good data scientist also has versatile communication skills and is an absolute team player, and has an innate curiosity to explore and experiment with data.
5. Translating complex matters for the layperson
They are able to translate difficult, complex matters for laypeople, bridging the analytical models and the business end user, appropriately using visualizations. They also tend to be skeptical and tenacious people, they ask a lot of questions about the viability of a particular solution and whether it will really work. Curiosity and creativity ensure that the data scientist is not one-dimensional and helps to build the rapidly-evolving field of knowledge.
First, consider an application for Data Science
As with Big Data, the most important step is to consider what better or faster decisions can be made based on answering the questions and the insights to be gained. Too often the focus is on the data and not on what it has to deliver. The aforementioned soft skills in the Data Sciences team help to ensure that this pitfall can be avoided.