100 Data Science in Python Interview Questions and Answers for 2024

Data Science in Python Interview Questions & Answers for 2024, focused on Python programming that will be asked in data science job interviews.

100 Data Science in Python Interview Questions and Answers for 2024
 |  BY ProjectPro

Python’s growing adoption in data science has pitched it as a competitor to R programming language. With its various libraries maturing over time to suit all data science needs, a lot of people are shifting towards Python from R. This might seem like the logical scenario. But R would still come out as the popular choice for data scientists.

ProjectPro Free Projects on Big Data and Data Science

People are shifting towards Python but not as many as to disregard R altogether. We have highlighted the pros and cons of both these languages used in Data Science in our Python vs R article. It can be seen that many data scientists learn both languages Python and R to counter the limitations of either language. Being prepared with both languages will help in data science job interviews.

Click here to get 100+ Data Science interview coding questions + solution code.

Python is the “friendly” programming language that plays well with everyone and runs on everything. So it is hardly surprising that Python offers quite a few libraries that deal with data efficiently and is therefore used in data science. Python was used for data science only in recent years. But now that it has firmly established itself as an important language for Data Science, Python programming is not going anywhere. Mostly Python is used for data analysis when you need to integrate the results of data analysis into web apps or if you need to add mathematical/statistical codes for production.


Build a Multi Touch Attribution Machine Learning Model in Python

Downloadable solution code | Explanatory videos | Tech Support

Start Project

In our previous posts 100 Data Science Interview Questions and Answers (General) and 100 Data Science in R Interview Questions and Answers, we listed all the questions that can be asked in data science job interviews. This article in the series lists questions that are related to Python programming and will probably be asked in data science interviews.

Data Science in Python Interview Questions and Answers

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

Data Science interview coding questions + solution code

Here are some solved data cleansing code snippets that you can use in your interviews or projects. Click on these links below to download the python code for these problems. A complete list of ready-to-use solved use-cases is available here. 
How to Flatten a Matrix?
How to Calculate Determinant of a Matrix or narray?
How to calculate the Diagonal of a Matrix?
How to Calculate Trace of a Matrix?
How to invert a matrix or nArray in Python?
How to convert a dictionary to a matrix or nArray in Python?
How to reshape a Numpy array in Python?
How to select elements from Numpy array in Python?
How to create a sparse Matrix in Python?
How to Create a Vector or Matrix in Python?
How to run a basic RNN model using Pytorch?
How to save and reload a deep learning model in Pytorch?
How to use auto encoder for unsupervised learning models?​

Data Science Python Interview Questions and Answers

The questions below are based on the course that is taught at ProjectPro – Data Science in Python. This is not a guarantee that these questions will be asked in Data Science Interviews. The purpose of these questions is to make the reader aware of the kind of knowledge that an applicant for a Data Scientist position needs to possess.   

Data Science Interview Questions in Python are generally scenario based or problem based questions where candidates are provided with a data set and asked to do data munging, data exploration, data visualization, modelling, machine learning, etc. Most of the data science interview questions are subjective and the answers to these questions vary, based on the given data problem. The main aim of the interviewer is to see how you code, what are the visualizations you can draw from the data, the conclusions you can make from the data set, etc.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

1) How can you build a simple logistic regression model in Python? 

2) How can you train and interpret a linear regression model in SciKit learn?

3) Name a few libraries in Python used for Data Analysis and Scientific computations.

NumPy, SciPy, Pandas, SciKit, Matplotlib, Seaborn

4) Which library would you prefer for plotting in Python language: Seaborn or Matplotlib?

Matplotlib is the python library used for plotting but it needs lot of fine-tuning to ensure that the plots look shiny. Seaborn helps data scientists create statistically and aesthetically appealing meaningful plots. The answer to this question varies based on the requirements for plotting data.

5)  What is the main difference between a Pandas series and a single-column DataFrame in Python?

6) Write code to sort a DataFrame in Python in descending order.

7) How can you handle duplicate values in a dataset for a variable in Python? 

8) Which Random Forest parameters can be tuned to enhance the predictive power of the model?

9) Which method in pandas.tools.plotting is used to create scatter plot matrix?

    Scatter_matrix

10) How can you check if a data set or time series is Random?

To check whether a dataset is random or not use the lag plot. If the lag plot for the given dataset does not show any structure then it is random.

 11) Can we create a DataFrame with multiple data types in Python? If yes, how can you do it?

 12) Is it possible to plot histogram in Pandas without calling Matplotlib? If yes, then write the code to plot the histogram?

 13) What are the possible ways to load an array from a text data file in Python? How can the efficiency of the code to load data file be improved?

   numpy.loadtxt ()

14) Which is the standard data missing marker used in Pandas?

NaN

15) Why you should use NumPy arrays instead of nested Python lists?

16)  What is the preferred method to check for an empty array in NumPy? 

17) List down some evaluation metrics for regression problems.

18) Which Python library would you prefer to use for Data Munging?

Pandas

Recommended Reading

19) Write the code to sort an array in NumPy by the nth column?

Using argsort () function this can be achieved. If there is an array X and you would like to sort the nth column then code for this will be x[x [: n-1].argsort ()]

20) How are NumPy and SciPy related?

21) Which python library is built on top of matplotlib and Pandas to ease data plotting?

Seaborn

22) Which plot will you use to access the uncertainty of a statistic?

Bootstrap

23) What are some features of Pandas that you like or dislike?

24) Which scientific libraries in SciPy have you worked with in your project?

25) What is pylab?

A package that combines NumPy, SciPy and Matplotlib into a single namespace.

26) Which python library is used for Machine Learning?

SciKit-Learn

Upskill yourself for your dream job with industry-level big data projects with source code

Basic Python Programming  Interview Questions 

27) How can you copy objects in Python?

The functions used to copy objects in Python are-

1)  Copy.copy () for shallow copy

2)  Copy.deepcopy () for deep copy

However, it is not possible to copy all objects in Python using these functions.  For instance, dictionaries have a separate copy method whereas sequences in Python have to be copied by ‘Slicing’.

28) What is the difference between tuples and lists in Python?

Tuples can be used as keys for dictionaries i.e. they can be hashed. Lists are mutable whereas tuples are immutable - they cannot be changed. Tuples should be used when the order of elements in a sequence matters. For example, set of actions that need to be executed in sequence, geographic locations or list of points on a specific route.

29) What is PEP8?

PEP8 consists of coding guidelines for Python language so that programmers can write readable code making it easy to use for any other person, later on.

30) Is all the memory freed when Python exits?

No it is not, because the objects that are referenced from global namespaces of Python modules are not always de-allocated when Python exits.

31) What does _init_.py do?

_init_.py is an empty py file used for importing a module in a directory. _init_.py provides an easy way to organize the files. If there is a module maindir/subdir/module.py,_init_.py is placed in all the directories so that the module can be imported using the following command-

import  maindir.subdir.module

32) What is the different between range () and xrange () functions in Python?

range () returns a list whereas xrange () returns an object that acts like an iterator for generating numbers on demand.

33) How can you randomize the items of a list in place in Python?

Shuffle (lst) can be used for randomizing the items of a list in Python.

34) What is a pass in Python?

Pass in Python signifies a no operation statement indicating that nothing is to be done.

35) If you are gives the first and last names of employees, which data type in Python will you use to store them?

You can use a list that has first name and last name included in an element or use Dictionary.

36) What happens when you execute the statement mango=banana in Python?

A name error will occur when this statement is executed in Python.

37) Write a sorting algorithm for a numerical dataset in Python. 

38) Optimize the below python code-

word = 'word'

print word.__len__ ()

Answer: print ‘word’._len_ ()

39) What is monkey patching in Python?

Monkey patching is a technique that helps the programmer to modify or extend other code at runtime. Monkey patching comes handy in testing but it is not a good practice to use it in production environment as debugging the code could become difficult.

Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualizatio

40) Which tool in Python will you use to find bugs if any?

Pylint and Pychecker. Pylint verifies that a module satisfies all the coding standards or not. Pychecker is a static analysis tool that helps find out bugs in the course code.

 41) How are arguments passed in Python- by reference or by value?

The answer to this question is neither of these because passing semantics in Python are completely different. In all cases, Python passes arguments by value where all values are references to objects.

42) You are given a list of N numbers. Create a single list comprehension in Python to create a new list that contains only those values which have even numbers from elements of the list at even indices. For instance if list[4] has an even value the it has be included in the new output list because it has an even index but if list[5] has an even value it should not be included in the list because it is not at an even index.

 [x for x in list [1::2] if x%2 == 0]

The above code will take all the numbers present at even indices and then discard the odd numbers.

43) Explain the usage of decorators.

Decorators in Python are used to modify or inject code in functions or classes. Using decorators, you can wrap a class or function method call so that a piece of code can be executed before or after the execution of the original code. Decorators can be used to check for permissions, modify or track the arguments passed to a method, logging the calls to a specific method, etc.

44) How can you check whether a pandas data frame is empty or not?

The attribute df.empty is used to check whether a data frame is empty or not.

45) What will be the output of the below Python code –

def multipliers ():

    return [lambda x: i * x for i in range (4)]

    print [m (2) for m in multipliers ()]

The output for the above code will be [6, 6,6,6]. The reason for this is that because of late binding the value of the variable i is looked up when any of the functions returned by multipliers are called.

46) What do you mean by list comprehension?

The process of creating a list while performing some operation on the data so that it can be accessed using an iterator is referred to as List Comprehension.

Example:

[ord (j) for j in string.ascii_uppercase]

     [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]

Matser Data Science with Python by working on innovative Data Science Projects in Python

47) What will be the output of the below code

word = ‘aeioubcdfg'

print word [:3] + word [3:]

The output for the above code will be: ‘aeioubcdfg'.

In string slicing when the indices of both the slices collide and a “+” operator is applied on the string it concatenates them.

Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro

48) list= [‘a’,’e’,’i’,’o’,’u’]

print list [8:]

The output for the above code will be an empty list []. Most of the people might confuse the answer with an index error because the code is attempting to access a member in the list whose index exceeds the total number of members in the list. The reason being the code is trying to access the slice of a list at a starting index which is greater than the number of members in the list.

49) What will be the output of the below code:

def foo (i= []):

    i.append (1)

    return i

>>> foo ()

>>> foo ()

The output for the above code will be-

[1]

[1, 1]

The argument to the function foo is evaluated only once when the function is defined. However, since it is a list, on every all the list is modified by appending a 1 to it.

50) Can the lambda forms in Python contain statements?

No, as their syntax is restricted to single expressions and they are used for creating function objects which are returned at runtime.

This list of questions for Python interview questions and answers is not an exhaustive one and will continue to be a work in progress. Let us know in the comments below if we missed out on any important question that needs to be up here.

51) What will be the data type of x for the following code?

`x = input(“Enter a number”)

String. 

In Python versions released earlier than 3.x, there was a function by the same which tried to guess the data type of the input. But, now the default data type is string.

52) What do you mean by pickling and unpickling in Python?

Python has a module called pickle which accepts any python object as an input and transforms it into a string representation before dumping it into a file using the dump function. This process is called pickling. The process of obtaining python objects from a pickled file is called unpickling.

53) What will be the output of the following code:

>>>Welcome = “Welcome to ProjectPro!”

>>>Welcome[1:7:2]

‘ecm’ 

54) What is wrong with the following code:

>>>print(“I love browsing through “ProjectPro” content.”)

 It will give you a syntax error. That is because if one wants to print double quotes, they need to use single quotes for string. So, the correct code would be:

>>>print(“I love browsing through ‘ProjectPro’ content.”)

Or

>>>print(‘I love browsing through “ProjectPro” content.’)

55) How can you iterate over a few files in python?

>>>import os

>>>directory = r’C:\Users\admin directory’

>>>for filename in os.listdir(directory):

>>> if(filename.endswith(‘.csv’):

>>> print(os.path.join(directory,filename))

This code will help you in automating your task.

56) What will be the data type of x for the following code?

`x = input(“Enter a number”)

String. 

In Python versions released earlier than 3.x, there was a function by the same which tried to guess the data type of the input. But, now the default data type is a string.

57) What do you mean by pickling and unpickling in Python?

Python has a module called pickle which accepts any Python object as an input and transforms it into a string representation before dumping it into a file using the dump function. This process is called pickling. The process of obtaining python objects from a pickled file is called unpickling.

58) What will be the output of the following code:

>>>Welcome = “Welcome to ProjectPro!”

>>>Welcome[1:7:2]

‘ecm’

59) What is wrong with the following code:

>>>print(“I love browsing through “ProjectPro” content.”)

It will give you a syntax error. That is because if one wants to print double quotes, they need to use single quotes for string. So, the correct code would be:

>>>print(“I love browsing through ‘ProjectPro’ content.”)

Or

>>>print(‘I love browsing through “ProjectPro” content.’)

Advanced Python Data Science Interview Questions and Answers

Go through the following python interview questions for data science that are slightly advanced. These python data science interview questions might be difficult for you to answer but it is important that you prepare for these python interview questions as well before going for your interview.

1) How will you use Pandas library to import a CSV file from a URL?

import pandas as pd

         Data = pd.read_CV(‘sample_url’)

2) How will you transpose a NumPy array?

nparr.T

3) What are universal functions for n-dimensional arrays?

Universal functions are the functions that perform mathematical operations on each element of an n-dimensional array.
Example: np.sqrt() and np.exp() evaluate square root and exponential of each element of an array respectively.

4) List a few statistical methods available for a NumPy array.

np.means(), np.cumsum(), np.sum(),

5) What are boolean arrays? Write a code to create a boolean array using the NumPy library.

A boolean array is an array whose elements are of the boolean data type. A vital point to remember is that for boolean arrays, Python keywords and and or do not work.

 Barr = np.array([ True, True, False, True, False, True, False], dtype=bool) 

6) What is Fancy Indexing?

IN NumPy, one can use an integer list to describe the indexing of NumPy arrays. For example, Array[[2,1,0,3]] for an array of dimensions 4x4 will print the rows in the order specified by the list.

7) What is NaT in Python’s Pandas library?

NaT stands for Not a Time. It is the NA value for timestamp data

8) What is Broadcasting for NumPy arrays?

Broadcasting is a technique that specifies how arithmetic calculations are performed between arrays of different dimensions. 

This can be represented by the following image:

Broadcasting for NumPy arrays

9) What is the necessary condition for broadcasting two arrays?

The two arrays must satisfy either of the following conditions:

  1. For each dimension starting from the end, the axis lengths should be equal.

  2. Either of the matrices should be one dimensional

10) What is PEP for Python?

PEP stands for Python Enhancement Proposal. It is a document that provides information related to new features of Python, its processes or environments.

11) What do you mean by overfitting a dataset?

Overfitting a dataset means our model is fitting the training dataset so well that it performs poorly on the test dataset. One of the key reasons for overfitting could be that the model has learned the noise in the dataset.

12) What do you mean by underfitting a dataset?

Underfitting a dataset means our model is fitting the training dataset poorly. It usually occurs when we don’t fine-tune the parameters of a model and keep looking for alternatives.

13) What is the difference between a test set and a validation set?

For unsupervised learning, we use a validation set for selecting a model based on the estimated prediction error. On the other hand, we use a test set to assess the accuracy of the finally chosen model.

14) What is F1-score for a binary classifier? Which library in Python contains this metric?

The F1-score is a combination of precision and recall that represents the harmonic mean of the two quantities. It is given by the formula

F1-score for a binary classifier

15) Write a function for f1_score that takes True Positive, False Positive, True Negative, and False Negative as input and outputs f1_score.

def f1_score(tp, fp, fn, tn):
  p =  tp / (tp + fp) 
  r = tp / (tp + fn)
  return 2 * p * r / (p + r)

16) Using sklearn library, how will you implement ridge regression?

>>> from sklearn import linear_model
>>>reg = linear_model.LinearRegression()
>>> reg = linear_model.Ridge(alpha=0.5)
>>> reg.fit(sample_dataset)

17) Using sklearn library, how will you implement lasso regression?

>>> from sklearn import linear_model
>>>reg = linear_model.LinearRegression()
>>> reg = linear_model.Lasso(alpha=0.4)
>>> reg.fit(sample_dataset)

18) How is correlation a better metric than covariance?

Covariance is a metric that reflects how two variables (a and b) vary from their respective average values (ā and ƀ). It is given by

covariance

Where N is the number of data points.

Correlation is a metric that takes into account the standard deviations of the variables (a and b). Mathematically, it is defined as,

correlation

Covariance for variables that have large deviations from the mean would become large but the variables could still be related to each other. Correlation is thus a better metric than covariance for it divides out the standard deviations of the variables. 

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Request a demo

19) What are confounding factors?

Cofounding factors are the variables that relate to both dependent and independent variables. It cannot be picked through the evaluation of correlations.

20) What is namespace in Python?

A namespace is a collection of names that are created when we start running a Python interpreter and continue to exist till the interpreter is running.

21) What is try-except-finally in Python?

If we want to write a code in Python, and we are not sure whether it is error-free or not, then we can use try-except-finally in Python. 

We use try to test a block of code for the error.

We use except to handle the error.

We use finally to execute the remaining code irrespective of the result of try and except blocks.

Example,

>>>try:

   print(a)

>>>except:

   print("Something is not right! ")

>>>finally:

   print("The 'try except block' is over")

 

22) What is the difference between append() and extend() functions in Python?

append(): Append() is a  function in Python that adds the element received at the input to the end of the list. It increments the size of the list by one.

Example: 

>>>List1 = [‘I’, ‘love’]

>>>List1.append([‘ProjectPro’, ‘and’, ’Dezyre’])

>>>print(List1)

Output:

[‘I’, ‘love’, [‘ProjectPro’, ‘and’, ’Dezyre’] ]

extend(): Extend() is a function in Python that first iterates over each element of the input and then adds each element to the end of the list.

Example:

>>>List1 = [‘I’, ‘love’]

>>>List1.extend([‘ProjectPro’, ‘and’, ’Dezyre’])

>>>print(List1)

Output:

[‘I’, ‘love’, ‘ProjectPro’, ‘and’, ’Dezyre’ ]

23) What is the use of enumerate() function?

Enumerate() is a function in Python that assigns a counting label to each element of the iterable object and returns it in the form of an enumerate object as output.

Example:

>>>List1 = ["eat","sleep","ProjectPro"] 

>>>String1 = "Repeat" 

>>>

>>># Enumerate objects

>>>object1 = enumerate(List1) 

>>>object2 = enumerate(String1) 

>>>print ("Return type:",type(object1)) 

>>>print (list(enumerate(List1))) 

>>>

>>># Start index to 2 instead pf 0 

>>>print (list(enumerate(String1,2)))

 

Output: 

Return type: < type 'enumerate' >

[(0, 'eat'), (1, 'sleep'), (2, 'ProjectPro')]

[(2, 'R'), (3, 'e'), (4, 'p'), (5, 'e'), (6,’a’),(7,’t’)]

 

24) List the immutable and mutable built-in data types available in Python.

Immutable Data Types

Mutable Data Types

Strings

Numbers

Tuples

Lists

Dictionaries

Sets

25) What is negative indexing in Python?

So far, most programming languages didn’t allow negative indexing and Python is one of those rare languages which supports that. Negative indexing means that one can use negative numbers to access the elements of an array. But, the key point to remember is that the index -1 represents the last element of the array, -2 represents the second last element of the array and so on. Thus, in negative indexing, the counting starts from where the array ends.

For Example:

>>> a = “ProjectPro blogs are fun to read!”

>>>print(a[-5:-1])

Output:

read

 

Recommended Reading:

Python Interview Questions for Data Analysts

The following python interview questions are a must for a Data Analyst 

But that doesn’t mean a data scientist is not expected to know the answers to these python interview questions. Data scientists are often expected to do tasks that involve data visualization. We have thus prepared this insightful list of questions for you to help you become fully prepared for the Interview. If you want to know the answers to these questions, simply click on each of the python interview questions to know detailed answers. The best part is they are all available for FREE so do not hesitate to browse through all of them.

How to plot Validation Curve in Python? 

How to plot a ROC Curve in Python?

How to plot a learning Curve in Python?

How to generate stacked BAR plot in Python?

How to generate PIE plot in Python?

How to generate grouped BAR plot in Python?

How to generate scatter plot using Pandas and Seaborn?

How to generate BAR plot using pandas DataFrame?

How to use seaborn to visualise a Pandas dataframe?

 

Python Data Science Projects Ideas

In a Python Data Science Interview Questions round, you will be most probably asked to showcase projects. This is to ensure that you have a nice idea of how to implement the knowledge you have gained to solve real-world problems. If you haven’t explored enough projects and don’t know how to ace project-related questions, check out our Python Data Science Projects|Data Science Projects in Python that have been prepared by leading data scientists for you.

 

PREVIOUS

NEXT

Access Solved Big Data and Data Projects

About the Author

ProjectPro

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

Meet The Author arrow link