How does sklearn treat null values

In order to deal with missing values we can simply either replace them or remove them. There are plenty of options and function python provides to deal with NULL or nan values.
Last Updated: 03 Aug 2022

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING PROJECTS IN PYTHON DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective - How does scikit-learn treat null values?

In order to deal with missing values, we can simply either replace them or remove them. There is plenty of options and functions python provides to deal with NULL or NaN values.

Get Access to Plant Species Identification Project using Machine Learning

Recipe Objective - How does scikit-learn treat null values?

Some method

1. "SimpleImputer" class - SimpleImputer(missing_values=np.nan, strategy='mean')

2. "fillna" - df.fillna(df.mean(), inplace=True)

Links for the more related projects:-

https://www.projectpro.io/projects/data-science-projects/deep-learning-projects
https://www.projectpro.io/projects/data-science-projects/neural-network-projects

Example:-

Here is one example of missing values both on the test data set and train data set as well. Pick any strategy od replacing the missing values there are plenty of them for eg. the "SimpleImputer" class. Then let's see what happens:-

from __future__ import print_function import numpy as np from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer X_train = [[0, 0, np.nan], [np.nan, 1, 1]] Y_train = [0, 1] X_test_1 = [0, 0, np.nan] X_test_2 = [0, np.nan, np.nan] X_test_3 = [np.nan, 1, 1] # Create our imputer to replace missing values with the mean e.g. imp = SimpleImputer(missing_values=np.nan, strategy='mean') imp = imp.fit(X_train) # Impute our data, then train X_train_imp = imp.transform(X_train) clf = RandomForestClassifier(n_estimators=10) clf = clf.fit(X_train_imp, Y_train) for X_test in [X_test_1, X_test_2, X_test_3]: # Impute each test item, then predictX X_test = np.array(X_test).reshape(1,-1) X_test_imp = imp.transform(X_test) print(X_test, '->', clf.predict(X_test_imp))

[[ 0.  0. nan]] -> [0]
[[ 0. nan nan]] -> [0]
[[nan  1.  1.]] -> [1]

Sometimes missing values are simply not applicable. Sometimes it does not work. In these cases, you should use a model that can handle missing values. Scitkit-learn's models cannot handle missing values. XGBoost can easily handle the missing values.

What Users are saying..

Gautam Vermani

Data Consultant at Confidential

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask

Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

View Project Details

AWS MLOps Project to Deploy a Classification Model [Banking]

In this AWS MLOps project, you will learn how to deploy a classification model using Flask on AWS.

View Project Details

Mastering A/B Testing: A Practical Guide for Production

In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

View Project Details

Topic modelling using Kmeans clustering to group customer reviews

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

View Project Details

Build a Review Classification Model using Gated Recurrent Unit

In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

View Project Details

ML Model Deployment on AWS for Customer Churn Prediction

MLOps Project-Deploy Machine Learning Model to Production Python on AWS for Customer Churn Prediction

View Project Details

MLOps using Azure Devops to Deploy a Classification Model

In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

View Project Details

Loan Eligibility Prediction using Gradient Boosting Classifier

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

View Project Details

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask

MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.

View Project Details

Learn to Build a Neural network from Scratch using NumPy

In this deep learning project, you will learn to build a neural network from scratch using NumPy

View Project Details

How does sklearn treat null values

Recipe Objective - How does scikit-learn treat null values?

Table of Contents

Some method

Links for the more related projects:-

Example:-

Gautam Vermani

Relevant Projects

You might also like

Relevant Projects