How to impute missing class labels using nearest neighbours in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to impute missing class labels using nearest neighbours in Python?

How to impute missing class labels using nearest neighbours in Python?

This recipe helps you impute missing class labels using nearest neighbours in Python

0
In [1]:
## How to impute missing class labels using nearest neighbours in Python 
def Kickstarter_Example_28():
    print()
    print(format('How to impute missing class labels using nearest neighbours in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    import numpy as np
    from sklearn.neighbors import KNeighborsClassifier

    # Create Feature Matrix
    # Create feature matrix with categorical feature
    X = np.array([[0, 2.10, 1.45],
                  [2, 1.18, 1.33],
                  [0, 1.22, 1.27],
                  [1, 1.32, 1.97],
                  [1, -0.21, -1.19]])

    # Create Feature Matrix With Missing Values
    # Create feature matrix with missing values in the categorical feature
    X_with_nan = np.array([[np.nan, 0.87, 1.31],
                           [np.nan, 0.37, 1.91],
                           [np.nan, 0.54, 1.27],
                           [np.nan, -0.67, -0.22]])

    # Train k-Nearest Neighbor Classifier
    clf = KNeighborsClassifier(3, weights='distance')
    trained_model = clf.fit(X[:,1:], X[:,0])

    # Predict missing values' class
    imputed_values = trained_model.predict(X_with_nan[:,1:])
    print(); print(imputed_values)

    # Join column of predicted class with their other features
    X_with_imputed = np.hstack((imputed_values.reshape(-1,1), X_with_nan[:,1:]))
    print(); print(X_with_imputed)

    # Join two feature matrices
    print(); print(np.vstack((X_with_imputed, X)))

Kickstarter_Example_28()
******How to impute missing class labels using nearest neighbours in Python*******

[2. 1. 2. 1.]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]]

[[ 2.    0.87  1.31]
 [ 1.    0.37  1.91]
 [ 2.    0.54  1.27]
 [ 1.   -0.67 -0.22]
 [ 0.    2.1   1.45]
 [ 2.    1.18  1.33]
 [ 0.    1.22  1.27]
 [ 1.    1.32  1.97]
 [ 1.   -0.21 -1.19]]

Relevant Projects

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.