How to impute missing class labels using nearest neighbours in Python?
DATA MUNGING

How to impute missing class labels using nearest neighbours in Python?

How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python

0
In [1]:
## How to impute missing class labels using nearest neighbours in Python 
def Kickstarter_Example_28():
    print()
    print(format('How to impute missing class labels using nearest neighbours in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    import numpy as np
    from sklearn.neighbors import KNeighborsClassifier

    # Create Feature Matrix
    # Create feature matrix with categorical feature
    X = np.array([[0, 2.10, 1.45],
                  [2, 1.18, 1.33],
                  [0, 1.22, 1.27],
                  [1, 1.32, 1.97],
                  [1, -0.21, -1.19]])

    # Create Feature Matrix With Missing Values
    # Create feature matrix with missing values in the categorical feature
    X_with_nan = np.array([[np.nan, 0.87, 1.31],
                           [np.nan, 0.37, 1.91],
                           [np.nan, 0.54, 1.27],
                           [np.nan, -0.67, -0.22]])

    # Train k-Nearest Neighbor Classifier
    clf = KNeighborsClassifier(3, weights='distance')
    trained_model = clf.fit(X[:,1:], X[:,0])

    # Predict missing values' class
    imputed_values = trained_model.predict(X_with_nan[:,1:])
    print(); print(imputed_values)

    # Join column of predicted class with their other features
    X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:]))
    print(); print(X_with_imputed)

    # Join two feature matrices
    print(); print(np.vstack((X_with_imputed, X)))

Kickstarter_Example_28()
******How to impute missing class labels using nearest neighbours in Python*******

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]
 [ 0.    2.1   1.45]
 [ 2.    1.18  1.33]
 [ 0.    1.22  1.27]
 [ 1.    1.32  1.97]
 [ 1.   -0.21 -1.19]]

Relevant Projects

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.