Top 20 Logistic Regression Interview Questions and Answers

Ace your upcoming data science or machine learning job interview with these exclusive interview questions and answers on logistic regression curated for you.

Top 20 Logistic Regression Interview Questions and Answers
 |  BY ProjectPro

To become a successful data scientist in the industry, understanding the end-to-end workflow of the data science pipeline (understanding data, data pre-processing, model building, model evaluation, and model deployment) is essential. Assuming you do not want to overwhelm yourself with fancy machine learning algorithms, mastering the concepts of logistic regression should be your primary step to get familiar with the end-to-end data science pipeline. Logistic regression should be the first thing to master when becoming a data scientist or a machine learning engineer. Logistic regression is a robust machine learning algorithm that can do a fantastic job even at solving a very complex problem with 95% accuracy. Logistic regression is popularly used for classification problems when the dependent or target variable has only two (or a discrete number of) possible outcomes. It is essentially a modification of linear regression used for classification purposes.


Build a Logistic Regression Model in Python from Scratch

Downloadable solution code | Explanatory videos | Tech Support

Start Project

 

ProjectPro Free Projects on Big Data and Data Science

Top 20 Logistic Regression Interview Questions and Answers

There is a lot to learn if you want to become a data scientist or a machine learning engineer, but the first step is to master the most common machine learning algorithms in the data science pipeline. These interview questions on logistic regression would be your go-to resource when preparing for your next machine learning or data science interview.

Logistic Regression Interview Questions and Answers

Ace Your Next Job Interview with Mock Interviews from Experts to Improve Your Skills and Boost Confidence!

Data Science Interview Preparation

1) Is logistic regression a generative or a descriptive classifier? Why?

Logistic regression is a descriptive model. Logistic regression learns to classify by knowing what features differentiate two or more classes of objects. For example, to classify between an apple and an orange, it will learn that the orange is orange in color and an apple is not. On the other hand, a generative classifier like a Naive Bayes will store all the classes' critical features and then classify based on the features the test case best fits.

2) Can you use logistic regression for classification between more than two classes?

Yes, it is possible to use logistic regression for classification between more than two classes, and it is called multinomial logistic regression. However, this is not possible to implement without modifications to the vanilla logistic regression model.

3) How do you implement multinomial logistic regression?

The multinomial logistic classifier can be implemented using a generalization of the sigmoid, called the softmax function. The softmax represents each class with a value in the range (0,1), with all the values summing to 1. Alternatively, you could use the one-vs-all or one-vs-one approach using multiple simple binary classifiers.

4) Suppose that you are trying to predict whether a consumer will recommend a particular brand of chocolate or not. Let us say your hypothesis function outputs h(x)=0.55 where h(x) is the probability that y=1 (or that a consumer recommends the chocolate) given any input x. Does this mean that the consumer will recommend the chocolate?

The answer to this question is 'cannot be determined.' And this will remain the case unless you are provided additional data on the decision boundary. Let us say that you set the decision boundary such that y=1 is h(x)≥0.5 and 0; otherwise, then the answer for this question would be a resounding YES. However, if you set the decision boundary (although this is not very common practice) such that y=1 is h(x)≥0.6 and 0, otherwise the answer will be a NO.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

5) Why can't we use the mean square error cost function used in linear regression for logistic regression?

If we use mean square error in logistic regression, the resultant cost function will be non-convex, i.e., a function with many local minima, owing to the presence of the sigmoid function in h(x). As a result, an attempt to find the parameters using gradient descent may fail to optimize cost function properly. It may end up choosing a local minima instead of the actual global minima.

6) If you observe that the cost function decreases rapidly before increasing or stagnating at a specific high value, what could you infer?

A trend pattern of the cost curve exhibiting a rapid decrease before then increasing or stagnating at a specific high value indicates that the learning rate is too high. The gradient descent is bouncing around the global minimum but missing it owing to the larger than necessary step size.

7) What alternative could you suggest using a for loop (which is time-consuming) when using Gradient Descent to find the optimum parameters for logistic regression?

One commonly used efficient alternative to using for loop is vectorization, i.e., representing the parameter values to be optimized in a vector. By using this approach, all the vectors can be updated instead of iterating over them in a for loop.

Get FREE Access to Machine Learning Example Codes for Data Cleaning, Data Munging, and Data Visualization

8) Are there alternatives to find optimum parameters for logistic regression besides using Gradient Descent?

Yes, Gradient Descent is merely one of the many available optimization algorithms. Other advanced optimization algorithms can often help arrive at the optimum parameters faster and help with scaling for significant machine learning problems. A few such algorithms are Conjugate Gradient, BFGS, and L-BFGS algorithms.

9) How many binary classifiers would you need to implement one-vs-all for three classes? How does it work?

You would need three binary classifiers to implement one-vs-all for three classes since the number of binary classifiers is precisely equal to the number of classes with this approach. If you have three classes given by y=1, y=2, and y=3, then the three classifiers in the one-vs-all approach would consist of h(1)(x), which classifies the test cases as 1 or not 1, h(2)(x) which classifies the test cases as 2 or not 2 and so on. You can then take the results together to arrive at the correct classification. For example, with three categories, Cats, Dogs, and Rabbits, to implement the one-vs-all approach, we need to make the following comparisons:

    Binary Classification Problem 1: Cats vs. Dogs, Rabbits (or not Cats)

    Binary Classification Problem 2: Dogs vs. Cats, Rabbits (or not Dogs)

    Binary Classification Problem 3: Rabbits vs. Cats, Dogs (or not Rabbits)

Recommended Reading: 

10) How many binary classifiers would you need to implement one-vs-one for four classes? How does it work?

To implement one-vs-one for four classes, you will require six binary classifiers. This is because you will need to compare each class with each other class. In general, the formula for calculating the number of binary classifiers b is given as b=(no. of classes * (no. of classes -1))/ 2.

Suppose we have four different categories into which we need to classify the weather for a particular day: Sun, Rain, Snow, Overcast. Then to implement the one-vs-one approach, we need to make the following comparisons:

    Binary Classification Problem 1: Sun vs. Rain

    Binary Classification Problem 2: Sun vs. Snow

    Binary Classification Problem 3: Sun vs. Overcast

    Binary Classification Problem 4: Rain vs. Snow

    Binary Classification Problem 5: Rain vs. Overcast

    Binary Classification Problem 6: Snow vs. Overcast

11) What is the importance of regularisation? 

Regularisation is a technique that can help alleviate the problem of overfitting a model. It is beneficial when a large number of parameters are present, which help predict the target function. In these circumstances, it is difficult to select which features to keep manually. 

Regularisation essentially involves adding coefficient terms to the cost function so that the terms are penalized and are small in magnitude. This helps, in turn, to preserve the overall trends in the data while not letting the model become too complex. These penalties, in effect, restrict the influence a predictor variable can have over the target by compressing the coefficients, thereby preventing overfitting.

12) Why is the Wald Test useful in logistic regression but not in linear regression? 

The Wald test, also known as the Wald Chi-Squared Test, is a method to find whether the independent variables in a model are of significance. The significance of variables is decided by whether they contribute to the predictions or not. The variables that add no value to the model can therefore be deleted without risking severe adverse effects to the model. The Wald test is unnecessary in linear regression because it is easy to compare a more complicated model to a simpler model to check the influence of the added independent variables. After all, we can use the R2 value to make this comparison. However, this is not possible with logistic regression as we use Maximum Likelihood Estimate, which uses the previously mentioned method infeasible. The Wald test can be used for many different models, including those with binary variables or continuous variables, and has the added advantage that it only requires estimating one model.

13) Will the decision boundary be linear or non-linear in logistic regression models? Explain with an example.

The decision boundary is essentially a line or a plane that demarcates the boundary between the classes to which linear regression classifies the dependent variables. The shape of the decision boundary will depend entirely on the logistic regression model.

For logistic regression model given by hypothesis function h(x)=g(Tx)where g is the sigmoid function, if the hypothesis function is h(x)=g(1+2x2+3x3)then the decision boundary is linear. Alternatively, if h(x)=g(1+2x22+3x32)then the decision boundary is non-linear.

14) What are odds? Why is it used in logistic regression?

Odds are the ratio of the probability of success to the probability of failure. The odds serve to provide the constant effect a particular predictor or independent variable has on the output prediction. Expressing the effect of a predictor on the likelihood of the target having a particular value through probability does not describe this constant effect. In linear regression models, we often want to measure the unique effect of each independent variable on the output for which the odds are very useful.

Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your Career Transition with ProjectPro

15) Given fair die, what are the odds of occurrence of odd numbers?

The odds of occurrence of odd numbers is 1. 

There are three odd and three even numbers in a fair die, and therefore, the probability of occurrence of odd numbers is 3/6 or 0.5. Similarly, the odds of occurrence of numbers that are not odd is 0.5. Since odds is the ratio of the probability of success and that of failure, 

Odds = 0.5/0.5=1.

16) In classification problems like logistic regression, classification accuracy alone is not considered a good measure. Why?

Classification accuracy considers both true positives and false positives with equal significance. If this were just another machine learning problem of not too much consequence, this would be acceptable. However, when the problems involve deciding whether to consider a candidate for life-saving treatment, false positives might not be as bad as false negatives. The opposite can also be true in some cases. Therefore, while there is no single best way to evaluate a classifier, accuracy alone may not serve as a good measure.

Access to a curated library of 250+ end-to-end industry projects with solution code, videos and tech support.

Request a demo

17) It is common practice that when the number of features or independent variables is larger in comparison to the training set, it is common to use logistic regression or support vector machine (SVM) with a linear kernel. What is the reasoning behind this?

It is common to use logistic regression or SVM with a linear kernel because when there are many features with a limited number of training examples, a linear function should be able to perform reasonably well. Besides, there is not enough training data to allow for the training of more complex functions.

18) Between SVM and logistic regression, which algorithm is most likely to work better in the presence of outliers? Why?

SVM is capable of handling outliers better than logistic regression. SVM is affected only by the points closest to the decision boundary. Logistic regression, on the other hand, tries to maximize the conditional likelihood of the training data and is therefore strongly affected by the presence of outliers.

19) Which is the most preferred algorithm for variable selection?

Lasso is the most preferred for variable selection because it performs regression analysis using a shrinkage parameter where the data is shrunk to a point, and variable selection is made by forcing the coefficients of not so significant variables to be set to zero through a penalty.

20) What according to you is the method to best fit the data in logistic regression?

Maximum Likelihood Estimation to obtain the model coefficients which relate to the predictors and target.

 

PREVIOUS

NEXT

Access Solved Big Data and Data Projects

About the Author

ProjectPro

ProjectPro is the only online platform designed to help professionals gain practical, hands-on experience in big data, data engineering, data science, and machine learning related technologies. Having over 270+ reusable project templates in data science and big data with step-by-step walkthroughs,

Meet The Author arrow link