How to perform logistic regression in sklearn

This recipe helps you perform logistic regression in sklearn. Logistic regression is used when the dependent variable is categorical. It is a relationship between the one dependent categorical variable with one or more nominal.
Last Updated: 22 Dec 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How to perform logistic regression in sklearn?

Logistic regression is used when the dependent variable is categorical. So we can say logistic regression is a relationship between the one dependent categorical variable with one or more nominal, ordinal, interval variables.

Sci-kit learn provides the function "sklearn.linear_model.LogisticRegression" to perform the logistic regression.

Learn to Implement Customer Churn Prediction Using Machine Learning in Python

Links for the more related projects:-

https://www.projectpro.io/projects/data-science-projects/deep-learning-projects
https://www.projectpro.io/projects/data-science-projects/neural-network-projects

Example:-

Recipe Objective - How to perform logistic regression in sklearn?

Step:1 Import Necessary Library

from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn import metrics import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline v # load dataset diab_df = pd.read_csv("diabetes.csv") diab_df.head()

Step:2 Selecting Feature

#split dataset in features and target variable diab_cols = ['Pregnancies', 'Insulin', 'BMI', 'Age','Glucose','BloodPressure','DiabetesPedigreeFunction'] X = diab_df[diab_cols]# Features y = diab_df.Outcome # Target variable

Step:3 Splitting Data

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)

Step:4 Model Development and Prediction

# instantiate the model logreg = LogisticRegression(solver='liblinear') # fit the model with data logreg.fit(X_train,y_train) # predicting y_pred=logreg.predict(X_test) y_pred

array([1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1,
       1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int64)

Step:5 Model Evaluation using Confusion Matrix

cnf_matrix = metrics.confusion_matrix(y_test, y_pred) cnf_matrix

array([[119,  11],
       [ 26,  36]], dtype=int64)

Step:6 Visualizing Confusion Matrix using Heatmap

class_names=[0,1] # name of classes fig, ax = plt.subplots() tick_marks = np.arange(len(class_names)) plt.xticks(tick_marks, class_names) plt.yticks(tick_marks, class_names) # create heatmap sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap="YlGnBu" ,fmt='g') ax.xaxis.set_label_position("top") plt.tight_layout() plt.title('Confusion matrix', y=1.1) plt.ylabel('Actual label') plt.xlabel('Predicted label')

Step:7 Confusion Matrix Evaluation Metrics

print("Accuracy:",metrics.accuracy_score(y_test, y_pred)) print("Precision:",metrics.precision_score(y_test, y_pred)) print("Recall:",metrics.recall_score(y_test, y_pred))

Accuracy: 0.8072916666666666
Precision: 0.7659574468085106
Recall: 0.5806451612903226

Download Materials

diabetes

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More