Assuring error-free figures every single day on a scale of hundreds of millions of records is certainly not an easy endeavour. An unforgivable error from the data set will result in the prospect or customer lose confidence in the ability of the business organization to handle his/her business intelligence. Ever thought “What is it like to be a Data Scientist” or “What's a typical day like for a data scientist” as they bring a critical set of skills organizations need to win with big data.
The search term “Data Scientist” on LinkedIn shows more than 35,000 results. The foremost search result page displays the list of notable people at Silicon Valley startups. If you are sitting in any coffee shop at Silicon Valley or the Bay area, you will definitely overhear someone talking about data science.
We have already discussed a lot about “Big Data being the next Big Thing” in our earlier articles. But ever wondered who is analysing those 2.5 million pieces of content shared by Facebook users or the 40,000 search queries made on Google every second or even who is managing the 2.5 petabytes of data at Walmart that is generated from 1 million transactions every hour? What is it and who is crunching these huge numbers? The solution to all these big data problems best describes the job of a data scientist who finds patterns in big data and connects them to business decisions in real time.
“A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.” — Anjul Bhambhri, Vice President of Big Data Products, IBM
A thorough knowledge of Big Data and basic understanding of data science is not enough to get you into a Data Science career. Data Science is a highly complex and advanced discipline. If being a data scientist was easy, then there would not be a shortage of data scientists as projected by McKinsey report. Finding out trends in huge amounts of data might sound like a simple and straightforward concept but is difficult in execution which can be realized only after understanding what a data scientist does and what a typical day as a Data Scientist is.
Data scientists love to write code, solve data problems, create graphs, build charts and work 24x7 with other peers who just love similar things. It is not that a data scientist should live to work but work life balance is easy if a data scientist enjoys what he/she does.
There are number of universities coming up with two year data science programs but in reality data science requires considerably more training. Data scientists have advanced degrees and training in various subjects like computer science, statistics and mathematics. A data scientist has experience in data visualization, information management, data mining, infrastructure design, cloud computing and data warehousing.
The biggest misconception about the job role of a data scientists is that they deal only with complex algorithms. In reality, data scientists begin their work by translating a business use case into an analytics plan and most of their time is spent in understanding data, exploring patterns in data, developing hypothesis and measuring impact rather than in choosing algorithms.
Data scientists use powerful statistical modelling tools and techniques to make discoveries about various business processes, platforms and related business problems. The responsibilities of a data scientist involve operationalizing big data in specific ways - by providing better insights in pricing trends of various products, by closely monitoring the customer behaviour across various communication channels and by providing more personalized offers to prospects and existing customers of a business. To do these things efficiently and effectively on a large scale with precision, data scientist continuously seek the leading edge with regards to performance and continuously rethink on what is possible with big data.
The job profile of a data scientist goes far beyond just mathematics and code –they are the people who make data a thread that runs through the fabric of a business organization.
The foremost task of data scientist is to acquire all the data required to complete a specific analysis task. The challenges that a data scientist encounters in this step is to gather data that is distributed across several databases and most of the business organizations lack search capabilities and documentation, thus the data scientists have to depend on the DBA’s for help in acquiring the data.
Transformation is the most time consuming task of the analysis process that a data scientist does. Data scientists reformat and validate the acquired data so that it is palatable for data visualization tools and databases. Data scientists manipulate the data they have gathered for analysis - diagnose and measure the data quality and decide on what assumptions can be made. The challenging part of the transformation phase is making assumption because for data sets that have erroneous values, extreme values or any missing values- the assumptions for such data sets can be deceiving and incorrect.
The next step is to build a model from the assembled data .The biggest challenge in creating a model is to understand the relevance of data sets for a specific analysis. In this stage, data scientists can find out whether the data has been transformed completely or not. If not, they go back to the wrangling stage to identify relationships and data patterns. During this phase, data scientists also discover that most of the existing tools, algorithms or analytics packages cannot handle the huge size of their data sets.
Data science being a cross-functional discipline, data wrangling plays a vital role in executing data science. Data scientists spend lot of time in aligning analytics, business and other technology teams of the organization. Particularly in business organizations that have several competing agendas, getting all these groups together to communicate the same language and align priorities is a significant part of data scientists’ job.
The final step is to share the meaningful insights derived from the data. The challenge for data scientists here - is the difficulty in distributing and consuming reports that can have an impact on the interpretation of results. Without the knowledge on how the input data was transformed, the reports don’t allow support sensitivity analysis or interactive verification.
CLICK HERE to get the 2016 data scientist salary report delivered to your inbox!
The above listed are generalized tasks any data scientist performs, however a typical day for a data scientists varies across different business organizations. Here are some top answers taken from Quora for prospective data scientists to get detailed understanding on “What’s a typical day like for a Data Scientist” at some of the top companies like Microsoft, Pinterest,LinkedIn, and other top companies.
In conclusion, data scientist job role is in its beginnings, and its outlook is prospective as our lives and data become further deep-rooted in the big data revolution. If you are a data scientist, it would be great if you can share your typical day “as a data scientist”, with us, in comments below. For those who aspire to be great enterprise data scientists, please drop an email to our lead counsellor firstname.lastname@example.org to guide you on starting your career in the prosperous field of data science.