"A significant constraint on realizing value from Big Data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from Big Data. We project a need for 1.5 million additional managers and analysts in the United States who can ask the right questions and consume the results of the analysis of Big Data effectively.”- McKinsey Report
Data scientist job title is rising along the big data technology. It is an undeniable fact that data scientist or related roles such as data analysts, data engineers and statisticians are among the most sought after careers now. Attracted by the great compensation benefits, increased number of job opportunities and visibility to business leaders, professionals are heading towards the data scientist career path without much knowledge of the required business and technical skills, attitude and day to day responsibilities of a data scientist.
CLICK HERE to get the 2016 data scientist salary report delivered to your inbox!
Data Science is an extremely fun and challenging field to be in- data scientists are doing some pretty amazing things by playing with organizations’ data to draw business insights. It is no surprise that many people are looking to find out the answer to the question- “How to program your way into data science?”
At DeZyre we have always believed that it is better to learn from Industry Professionals on how to get into the industry. We have organized a DeZyre InSync session to answer this specific question – “How to program your way into data science?”
We had the pleasure to invite Eeshan Chatterjee, Data Scientist at MEDIA iQ Digital Ltd. MEDIA iQ Digital is an analytics technology company that unlocks insights to help businesses drive growth. The analytic technology at MEDIA iQ helps in driving prediction at scale so that the buying outputs of a business can be improved across various campaigns. Eeshan is an integral part of the analytics research for the display advertisements space and digital marketing at MEDIA iQ.
You can click on the link below to listen to a recording of the recent webinar on “How to program your way into data science?” by Eeshan Chatterjee.
What is data in the business world?
If you can observe the data, record it, store it and measure it – and doing that will help drive business growth, then this can be termed as ‘data in the business world’ that is important to any organization.
What data does my business generate?
“Each and every department right from CEO’s office to the janitorial division collects data.”
Each and every department of an organization whether it is –Sales & Marketing, Production, Operations, Finance, HR department, Supply Chain or any other division of a business organization, generates data. The idea behind storing data collected from all the divisions of an organization is to get a wholesome picture of the business. This helps organizations to look at the business from various perspectives together and that’s what is popularly known as “Data Science”.
What is Data Science?
Data Science helps businesses look at things that were nearly impossible or difficult to look at earlier. Data science helps businesses analyse various aspects like-
Data Science is a progressive step in various interdisciplinary subjects like business analytics that consists of modelling, mathematics, computer programming, statistics and data analytics. Data Science basically deals with using automated methods for analysing huge amounts of data to extract knowledge and meaningful insights from it.
Twitters’ Sid Patil very well pairs up Data Science with the popular Blackjack Game, he says-
“Blackjack is the only game in the casino that has a memory. What happened in the past is indicative of what will happen in the future, and this is very much like the world of analytics. We’re constantly trying to understand what has happened in order to understand the probability what might happen next, with varying amounts of certainty.”
The HypeCycle and Data Science
Image Credit : gartner.com
Every new thing in the technology world goes through the hype cycle; data science is one among them and is widely accepted by all. As it is evident from the below Gartner HypeCycle, Data science started to climb up to the peak of inflated expectations in the hype cycle in 2014 and as of today it’s right at the top. In future,the hyped up data scientists are going to do some great things with data which was not possible earlier like-
“ In tomorrow’s business, big data can tell you more about your operations than your people alone.”- says Emma Byrne
The Basics- How did we arrive at Data Science?
Businesses have always collected data and analysed it to make better business decisions, however business analytics has evolved exponentially since the 1970’s
The only thing that has not changed since the 1970’s is the motive to help businesses make better decisions.We are in an age where businesses talk to data at all times to make better business decisions.
The most important question is –“Can a statistician call himself a Data Scientist?” The answer is ‘not exactly’ - because the job role of a data scientist is diversified and demands a wider skill set. So what are the skills that need to be mastered. The most popular Venn diagram in the field of data science explains the amalgamation of skills professionals must possess, who want to enter the field of data science -
But is that true? Can we actually find professionals who possess all these skills? It is very difficult to find professionals who fit all the criteria mentioned. If by any luck, there are such brilliant people then why would they work for someone?
The job role of a data scientist is to bring in all these roles together i.e. striking a balance with the above skills along with good design thinking.
The popular Data Science Cheatsheet
Image Credit : datasciencecentral.com
The next most important question that bothers professionals yearning to enter the field of data science is – “Do Math and Business Acumen also require programming knowledge?” The answer is definitely ‘yes’. If we look at the above, one cannot tick off 15% of the above checklist without programming. Once a person learns the fundamentals and basics of statistics, the cheatsheet shows a complete new field of programming that data scientists must learn to proceed further. Having learnt programming, the next step is to master machine learning which requires solving complex mathematical problems which would be difficult to solve manually-so the only solution is to write a program for solving the mathematical or statistical problem.Visualization is also an integral part of data science that requires coding using various data visualization tools. If we look at all the sections in the cheatsheet, it can be observed that all of them require some kind of programming or the other. The “Toolbox” in the cheatsheet summarizes the various skills a data scientist must master to program their way into data science.
Programming for Math-The Algo Whiz Codebook
R or Python –The debate settled
With R and Python Language having similar capabilities, it becomes difficult for data scientists to decide – as to which is a better choice of the scripting language.
Data Scientists should Choose R Language When-
Data Scientists should Choose Python Language When-
Programming for Tech
For a data scientist, it is important to know the basics of C++ and Java as they form the backbone of all at-scale data systems.
When it comes to data in data science, programming for technology should be done on 4 general platforms-
1)Creating data platforms that ingest or manage data. These form the backbone of the data services.
2)The next step is to scale out or distribute data across multiple systems using Hadoop, YARN, Scala,and JADE (for multiple parallel analysis).
3)Having distributed the data, a data scientist must effectively process the data using low level subroutines written in C++.
4) Using GPU’s for processing - which are written in C or C++ or making use of machine learning algorithms with millions of data points.
Programming for Business
This is the most interesting part of data science- the results of the analysis should be presented in a manner that can be easily understood by end users.
There are different libraries of visualization available like d3.js that allows data scientists to provide interactive charts for answering business questions intuitively.
Design Thinking and Programming
A scientist approaches a problem with a very ‘problem focused approach’ whereas an architect approaches a problem with a ‘solution based approach’. A scientist breaks down the problem and gets to the best solution whereas an architect does not think on the intricacies of the problem but just does multiple iterations to arrive at a nice solution to the problem.
Design Thinking is an elegant approach that combines the two approaches-break down and analyse the problem into sub parts and bits. Look at the ones that are important- categorize the wants and needs. “Wants” are the good to have parts and “Needs” are the cant do without parts. Then synthesize the best solution from multiple solutions possible.The entire concept of design thinking depends on a future goal and not a solution. Design Thinking is the ability to define a better goal irrespective of the problem – that means looking at the BIG PICTURE. The ability to define a state where the problem has been solved and the solution of the problem can be used in several other ways then the desired future state has been achieved. There are several roadblocks to reach the desired future state and data scientists need to come up with all the possible solutions by prototyping them. The last step is to develop an at scale solution for the prototype.
If there’s any question you think would be helpful in programming your way into data science, feel free to ask in the comments below!