Latest Update made on August 8, 2019
Data Science has become an integral part of making crucial business decisions in today’s competitive market. This is one of the reasons companies are on a rampage to hire Data Scientists and qualified ones at that. The data science job interviews at companies like Facebook, Google, LinkedIn, AirBnB, Insight, Twitter, Mu Sigma have one thing in common – these interviews are tough. But we have a list of helpful data science interview questions from these companies that will help while someone is preparing to apply for the post of a Data Scientist.
CLICK HERE to get "ready-to-use" Data Science code snippets created by industry experts
If you would like more information about Data Science careers, please click the orange "Request Info" button on top of this page.
At a recent Big Data panel, organised by the Silicon Valley Bank in Boston, almost every speaker unanimously agreed that the time to hire Data Scientists was yesterday. Theoretically, Data Scientists should be able to look at a company’s data and figure out to make the data profitable for the business. It is not academics. Playing around with data and experimenting on it with different algorithms is not going to help in the long run if the business needs are not met. It is getting very difficult for companies to find qualified data scientists who understand that their projects will ultimately need to make money for the company. Most of the time the data based models that the data scientists work on, cannot be turned into profitable relevant applications. This is the reason the interview process for Data Scientists at any company is rigorous and complicated. Finding Data Scientists who not only have the necessary technical skill but also have the knowledge of the industry and the acumen to understand business needs.
Companies need to see value at the end of the day. Hiring a Data Scientist – no matter how cool it may be in theory, if they do not bring value to the company – is a loss. To avoid such scenarios, it is imperative that a company first understand – what kind of data they have, how much data they have and what kind of possible projects a Data Scientist can work on based on the data. Below we have listed some of the questions asked in Data Science Interviews, in the companies that have figured out why they need Data Scientists and what output they need from data science projects.
These questions listed here are after a thorough research of the companies’ sites and high quality discussion forums. This is not a guarantee that these very questions will be asked in data science interviews, but this is just to give the readers an idea of what can be expected when they apply for the position of Data Scientists in these tech companies.
Learn Data Science in Python to Land a Top Gig as a Data Scientist at Top Tech Companies!
1) A building has 100 floors. Given 2 identical eggs, how can you use them to find the threshold floor? The egg will break from any particular floor above floor N, including floor N itself.
2) In a given day, how many birthday posts occur on Facebook?
3) You are at a Casino. You have two dices to play with. You win $10 every time you roll a 5. If you play till you win and then stop, what is the expected pay-out?
4) How many big Macs does McDonald sell every year in US?
5) You are about to get on a plane to Seattle, you want to know whether you have to bring an umbrella or not. You call three of your random friends and as each one of them if it’s raining. The probability that your friend is telling the truth is 2/3 and the probability that they are playing a prank on you by lying is 1/3. If all 3 of them tell that it is raining, then what is the probability that it is actually raining in Seattle.
6) You can roll a dice three times. You will be given $X where X is the highest roll you get. You can choose to stop rolling at any time (example, if you roll a 6 on the first roll, you can stop). What is your expected pay-out?
7) How can bogus Facebook accounts be detected?
8) You have been given the data on Facebook user’s friending or defriending each other. How will you determine whether a given pair of Facebook users are friends or not?
9) How many dentists are there in US?
10) You have 2 dices. What is the probability of getting at least one 4? Also find out the probability of getting at least one 4 if you have n dices.
11) Pick up a coin C1 given C1+C2 with probability of trials p (h1) =.7, p (h2) =.6 and doing 10 trials. And what is the probability that the given coin you picked is C1 given you have 7 heads and 3 tails?
12) You are given two tables- friend_request and request_accepted. Friend_request contains requester_id, time and sent_to_id and request_accepted table contains time, acceptor_id and requestor_id. How will you determine the overall acceptance rate of requests?
14) What would you add to Facebook and how would you pitch it and measure its success?
15) How will you test that there is increased probability of a user to stay active after 6 months given that a user has more friends now?
16) You have two tables-the first table has data about the users and their friends, the second table has data about the users and the pages they have liked. Write an SQL query to make recommendations using pages that your friends liked. The query result should not recommend the pages that have already been liked by a user.
17) What is the probability of pulling a different shape or a different colour card from a deck of 52 cards?
18) Which technique will you use to compare the performance of two back-end engines that generate automatic friend recommendations on Facebook?
19) Implement a sorting algorithm for a numerical dataset in Python.
20) How many people are using Facebook in California at 1.30 PM on Monday?
21) You are given 50 cards with five different colors- 10 Green cards, 10 Red Cards, 10 Orange Cards, 10 Blue cards, and 10 Yellow cards. The cards of each colors are numbered from one to ten. Two cards are picked at random. Find out the probability that the cards picked are not of same number and same color.
1) Which companies participating in Insight would you be interested in working for?
2) Create a program in a language of your choice to read a text file with various tweets. The output should be 2 text files-one that contains the list of all unique words among all tweets along with the count for repeated words and the second file should contain the medium number of unique words for all tweets.
3) What motivates you to transition from academia to data science?
1) How can you measure engagement with given Twitter data?
2) Give a large dataset, find the median.
3) What is the good measure of influence of a Twitter user?
1) Do you have some knowledge of R - analyse a given dataset in R?
2) What will you do if removing missing values from a dataset cause bias?
3) How can you reduce bias in a given data set?
4) How will you impute missing information in a dataset?
1) Explain about string parsing in R language
2) A disc is spinning on a spindle and you don’t know the direction in which way the disc is spinning. You are provided with a set of pins.How will you use the pins to describe in which way the disc is spinning?
3) Describe the data analysis process.
4) How will you cut a circular cake into 8 equal pieces?
1) Find out K most frequent numbers from a given stream of numbers on the fly.
2) Given 2 vectors, how will you generate a sorted vector?
3) Implementing pow function
4) What kind of product you want to build at LinkedIn?
5) How will you design a recommendation engine for jobs?
6) Write a program to segment a long string into a group of valid words using Dictionary. The result should return false if the string cannot be segmented. Also explain about the complexity of the devised solution.
7) Define an algorithm to discover when a person is starting to search for new job.
8) What are the factors used to produce “People You May Know” data product on LinkedIn?
9) How will you find the second largest element in a Binary Search tree ? (Asked for a Data Scientist Intern job role)
1) Explain the difference between Supervised and Unsupervised Learning through examples.
2) How would you add value to the company through your projects?
3) Case Study based questions – Cars are implanted with speed tracker so that the insurance companies can track about our driving state. Based on this new scheme what kind of business questions can be answered?
4) Define standard deviation, mean, mode and median.
5) What is a joke that people say about you and how would you rate the joke on a scale of 1 to 10?
6) You own a clothing enterprise and want to improve your place in the market. How will you do it from the ground level ?
7) How will you customize the menu for Cafe Coffee Day ?
1) Estimate the probability of a disease in a particular city given that the probability of the disease on a national level is low.
2) How will inspect missing data and when are they important for your analysis?
3) How will you decide whether a customer will buy a product today or not given the income of the customer, location where the customer lives, profession and gender? Define a machine learning algorithm for this.
4) From a long sorted list and a short 4 element sorted list, which algorithm will you use to search the long sorted list for 4 elements.
5) How can you compare a neural network that has one layer, one input and output to a logistic regression model?
6) How do you treat colinearity?
7) How will you deal with unbalanced data where the ratio of negative and positive is huge?
8) What is the difference between -
i) Stack and Queue
ii) Linkedin and Array
1) Will Uber cause city congestion?
2) What are the metrics you will use to track if Uber’s paid advertising strategies to acquire customers work? How will you figure out the acceptable cost of customer acquisition?
3) Explain principal components analysis with equations.
4) Explain about the various time series forecasting technqiues.
5) Which machine learning algorithm will you use to solve a Uber driver accepting request?
6)How will you compare the results of various machine learning algorithms?
7) How to solve multi-collinearity?
8) How will you design the heatmap for Uber drivers to provide recommendation on where to wait for passengers? How would you approach this?
9) If we added one rider to the current SF market, how would that affect the existing riders and drivers?
10) What are the different performance metrics for evaluating Uber services?
11) How will you decide which version (Version 1 or Version 2) of the Surge Pricing Algorithms is working better for Uber ?
12) How will you explain JOIN function in SQL to a 10 year old ?
1) How can you build and test a metric to compare ranked list of TV shows or Movies for two Netflix users?
2) How can you decide if one algorithm is better than the other?
1) Write a function to check whether a particular word is a palindrome or not.
2) How can you compute an inverse matrix faster by playing with some computation tricks?
3) You have a bag with 6 marbles. One marble is white. You reach the bag 100 times. After taking out a marble, it is placed back in the bag. What is the probability of drawing a white marble at least once?
1) How do you take millions of users with 100's of transactions each, amongst 10000's of products and group the users together in a meaningful segments?
1) Check whether a given integer is a palindrome or not without converting it to a string.
2) What is the degree of freedom for lasso?
3) You have two sorted array of integers, write a program to find a number from each array such that the sum of the two numbers is closest to an integer i.
1) Suppose that American Express has 1 million card members along with their transaction details. They also have 10,000 restaurants and 1000 food coupons. Suggest a method which can be used to pass the food coupons to users given that some users have already received the food coupons so far.
2) You are given a training dataset of users that contain their demographic details, the pages on Facebook they have liked so far and results of psychology test based on their personality i.e. their openness to like FB pages or not. How will you predict the age, gender and other demographics of unseen data?
1) How will you test a machine learning model for accuracy?
2) Print the elements of a matrix in zig-zag manner.
3) How will you overcome overfitting in predictive models?
4) Develop an algorithm to sort two lists of sorted integers into a single list.
1) Count the total number of trees in United States.
2) Estimate the number of square feet pizza’s eaten in US each year.
3) A box has 12 red cards and 12 black cards. Another box has 24 red cards and 24 black cards. You want to draw two cards at random from one of the two boxes, which box has a higher probability of getting cards of same colour and why?
4) How will you prove that the square root of 2 is irrational?
5) What is the probability of getting a HTT combination before getting a TTH combination?
6) There are 8 identical balls and only one of the ball is slightly heavier than the others. You are given a balance scale to find the heavier ball. What is the least number of times you have to use the balance scale to find the heavier ball?
1) Write the code to reverse a Linked list.
2) What assumptions does linear regression machine learning algorithm make?
3) A stranger uses a search engine to find something and you do not know anything about the person. How will you design an algorithm to determine what the stranger is looking for just after he/she types few characters in the search box?
4) How will you fix multi-colinearity in a regression model?
5) What data structures are available in the Pandas package in Python programming language?
6) State some use cases where Hadoop MapReduce works well and where it does not.
7) What is the difference between an iterator, generator and list comprehension in Python?
8) What is the difference between a bagged model and a boosted model?
9) What do you understand by parametric and non-parametric methods? Explain with examples.
10) Have you used sampling? What are the various types of sampling have you worked with?
11) Explain about cross entropy ?
12) What are the assuptions you make for linear regression ?
13) Differentiate between gradient boosting and random forest.
14) What is the signigicance of log odds ?
1) How will you handle missing data ?
1) A dice is rolled twice, what is the probability that on the second chance it will be a 6?
2) What are Type 1 and Type 2 errors ?
3) Burn two ropes, one needs 60 minutes of time to burn and the other needs 30 minutes of time. How will you achieve this in 45 minutes of time ?
Here are some solved data science code snippets that you can use in your interviews or projects. Click on these links below to download the code for these problems. Complete list of ready-to-use solved use-cases is available here.
How to Flatten a Matrix?
How to Calculate Determinant of a Matrix or ndArray?
How to calculate Diagonal of a Matrix?
How to Calculate Trace of a Matrix?
How to invert a matrix or nArray in Python?
How to convert a dictionary to a matrix or nArray in Python?
How to reshape a Numpy array in Python?
How to select elements from Numpy array in Python?
How to create a sparse Matrix in Python?
How to Create a Vector or Matrix in Python?
How to run a basic RNN model using Pytorch?
How to save and reload a deep learning model in Pytorch?
How to use auto encoder for unsupervised learning models?
How to create RANDOM Numbers in Python?
How to define WHILE Loop in Python?
How to define FOR Loop in Python?
How to find MIN, MAX in a Dictionary?
How to deal with Dictionary Basics in Python?
How to deal with Date & Time Basics in Python?
How to Create and Delete a file in Python?
How to convert STRING to DateTime in Python?
How to use CONTINUE and BREAK statement within a loop in Python?
How to do numerical operations in Python using Numpy?
1) R programming language cannot handle large amounts of data. What are the other ways of handling it without using Hadoop infrastructure? (Asked at Pyro Networks)
2) Explain the working of a Random Forest Machine Learning Algorithm (Asked at Cyient)
3) Describe K-Means Clustering.(Asked at Symphony Teleca)
4) What is the difference between logistic and linear regression? (Asked at Symphony Teleca)
5) What kind of distribution does logistic regression follow? (Asked at Symphony Teleca)
6) How do you parallelize machine learning algorithms? (Asked at Vodafone) (get interview problems + solution code)
7) When required data is not available for analysis, how do you go about collecting it? (Asked at Vodafone)
8) What do you understand by heteroscadisticity (Asked at Vodafone)
9) What do you understand by confidence interval? (Asked at Vodafone)
10) Difference between adjusted r and r square. (Asked at Vodafone)
11) How Facebook recommends items to newsfeed? (Asked at Finomena)
12) What do you understand by ROC curve and how is it used? (Asked at MachinePulse)
13) How will you identify the top K queries from a file? (Asked at BloomReach)
14) Given a set of webpages and changes on the website, how will you test the new website feature to determine if the change works positively? (Asked at BloomReach)
15) There are N pieces of rope in a bucket. You put your hand into the bucket, take one end piece of the rope .Again you put your hand into the bucket and take another end piece of a rope. You tie both the end pieces together. What is the expected value of the number of loops within the bucket? (Asked at Natera)
16) How will you test if a chosen credit scoring model works or not? What data will you look at? (Asked at Square)
17) There are 10 bottles where each contains coins of 1 gram each. There is one bottle of that contains 1.1 gram coins. How will you identify that bottle after only one measurement? (Data Science Puzzle asked at Latent View Analytics)
18) How will you measure a cylindrical glass filled with water whether it is exactly half filled or not? You cannot measure the water, you cannot measure the height of the glass nor can you dip anything into the glass. (Data Science Puzzle asked at Latent View Analytics)
19) What would you do if you were a traffic sign? (Data Science Interview Question asked at Latent View Analytics)
20) If you could get the dataset on any topic of interest, irespective of the collection methods or resources then how would the dataset look like and what will you do with it. (Data Scientist Interview Question asked at CKM Advisors)
21) Given n samples from a uniform distribution [0,d], how will you estimate the value of d? (Data Scientist Interview Question asked at Spotify)
22) How will you tune a Random Forest? (Data Science Interview Question asked at Instacart). (get interview problems + solution code)
23) Tell us about a project where you have extracted useful information from a large dataset. Which machine learning algorithm did you use for this and why? (Data Scientist Interview Question asked at Greenplum)
24) What is the difference between Z test and T test ? (Data Scientist Interview Questions asked at Antuit)
25) What are the different models you have used for analysis and what were your inferences? (Data Scientist Interview Questions asked at Cognizant)
26) Given the title of a product, identify the category and sub-category of the product. (Data Scientist interview question asked at Delhivery)
27) What is the difference between machine learning and deep learning? ( Data Scientist Interview Question asked at InfoObjects)
28) What are the different parameters in ARIMA models ? (Data Science Interview Question asked at Morgan Stanley)
29) What are the optimisations you would consider when computing the similarity matrix for a large dataset? (Data Science Interview questions asked at MakeMyTrip)
30) Use Python programming language to implement a toolbox with specific image processing tasks.(Data Science Interview Question asked at Intuitive Surgical)
31) Why do you use Random Forest instead of a simple classifier for one of the classification problems ? (Data Science Interview Question asked at Audi)
32) What is an n-gram? (Data Science Interview Question asked at Yelp)
33) What are the problems related to Overfitting and Underfitting and how will you deal with these ? (Data Science Interview Question asked at Tiger Analytics)
34) Given a MxN dimension matrix with each cell containing an alphabet, find if a string is contained in it or not.(Data Science Interview Question asked at Tiger Analytics)
35) How do you "Group By" in R programming language without making use of any package ? (Data Scientist Interview Question asked at OLX)
36) List 15 features that you will make use of to build a classifier for OLX website.(Data Scientist Interview Question asked at OLX)
37) How will you build a caching system using an advanced data structure like hashmap ? (Data Scientist Interview Question asked at OLX)
38) How to reverse strings that have changing positions ? (Data Scientist Interview Question asked at Tiger Analytics)
39) How do you select a cricket team ? (Data Scientist Interview Question asked at Quantiphi)
40) What is the difference between trees and random forest ? (Data Scientist Interview Question asked at Salesforce)
If you are asked questions like what is your favourite leisure activity? Or something like what is that you like to do for fun? Most of the people often tend to answer that they like to read programming books or do coding thinking that this is what they are supposed to say in a technical interview. Is this something you really do it for fun? A key point to bear in mind that the interviewer is also a person and interact with them as a person naturally. This will help the interviewer see you as an all-rounder who can visualize the company’s whole vision and not just view business problems from an academic viewpoint.
You might also be interested to read :
100+ Solved code snippets from industry experts
We request the data science community to help the prospective data scientists prepare for data science interviews.Share with us in comments below about the various kinds of data science interview questions asked at various top tech companies which are not listed above.