How to impute missing class labels in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to impute missing class labels in Python?

How to impute missing class labels in Python?

This recipe helps you impute missing class labels in Python

Recipe Objective

In many dataset we find null values in the features so how to manage and fill the null values.

So this is the recipe on how we can impute missing class labels in Python.

Step 1 - Import the library

import numpy as np from sklearn.preprocessing import Imputer

We have imported numpy and Imputer which is needed.

Step 2 - Creating Data

We have created a matrix with different values in it and also with null values. X = np.array([[2, 2.15, 1.5], [1, 1.64, 1.25], [2, 1.15, 1.45], [0, -0.45, -1.52], [np.nan, 0.54, 1.15], [np.nan, -0.65, -0.61]])

Step 3 - Imputing Missing values

We have created an Object for Imputer with parameters strategy in which we have to pass the method of imputing and 0 or 1 in axis for rows and columns. We have used fit_transform to fit the data and impute values in null. imputer = Imputer(strategy="most_frequent", axis=0) print(X) print(imputer.fit_transform(X))

[[ 2.    2.15  1.5 ]
 [ 1.    1.64  1.25]
 [ 2.    1.15  1.45]
 [ 0.   -0.45 -1.52]
 [  nan  0.54  1.15]
 [  nan -0.65 -0.61]]

[[ 2.    2.15  1.5 ]
 [ 1.    1.64  1.25]
 [ 2.    1.15  1.45]
 [ 0.   -0.45 -1.52]
 [ 2.    0.54  1.15]
 [ 2.   -0.65 -0.61]]

Download Materials

Relevant Projects

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Build a Face Recognition System in Python using FaceNet
In this deep learning project, you will build your own face recognition system in Python using OpenCV and FaceNet by extracting features from an image of a person's face.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Grouping similar schools/colleges using scorecard and other factors
Use cluster analysis to identify the groups of characteristically similar schools in the College Scorecard dataset. Considerations: Clustering Algorithm Data Preparation How will you deal with missing values? Categorical variables? Feature intercorrelations? Feature normalization or scaling? Dimensionality reduction? Hyperparameters How will you set the parameters -- the algorithm's knobs and dials, so to speak -- in order to achieve valid and useful output? Interpretation Is it possible to explain what each cluster represents? Did you retain or prepare a set of features that enables a meaningful interpretation of the clusters? Do the compositions of the clusters seem to make sense? Validation How will you measure the validity of your clustering process? Which metrics will you use and how will you apply them?

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.