How to impute missing values with means in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to impute missing values with means in Python?

How to impute missing values with means in Python?

This recipe helps you impute missing values with means in Python

0

Recipe Objective

Some times we find few missing values in various features in a dataset. Our model can not work efficiently on nun values and in few cases removing the rows having null values can not be considered as an option because it leads to loss of data of other features.

So this is the recipe on How we can impute missing values with means in Python

Step 1 - Import the library

import pandas as pd import numpy as np from sklearn.preprocessing import Imputer

We have imported pandas, numpy and Imputer from sklearn.preprocessing.

Step 2 - Setting up the Data

We have created a empty DataFrame first then made columns C0 and C1 with the values. Clearly we can see that in column C1 three elements are nun. df = pd.DataFrame() df['C0'] = [0.2601,0.2358,0.1429,0.1259,0.7526, 0.7341,0.4546,0.1426,0.1490,0.2500] df['C1'] = [0.7154,np.nan,0.2615,0.5846,np.nan, 0.8308,0.4962,np.nan,0.5340,0.6731] print(df)

Step 3 - Using Imputer to fill the nun values with the Mean

We know that we have few nun values in column C1 so we have to fill it with the mean of remaining values of the column. So for this we will be using Imputer function, so let us first look into the parameters.

  • missing_values : In this we have to place the missing values and in pandas it is 'NaN'.
  • strategy : In this we have to pass the strategy that we need to follow to impute in missing value it can be mean, median, most_frequent or constant. By default it is mean.
  • fill_value : By default it is set as none. It is used when the strategy is set to constant then we have to pass the value that we want to fill as a constant in all the nun places.
  • axis : In this we have to pass 0 for columns and 1 for rows.
So we have created an object and called Imputer with the desired parameters. Then we have fit our dataframe and transformed its nun values with the mean and stored it in imputed_df. Then we have printed the final dataframe. miss_mean_imputer = Imputer(missing_values='NaN', strategy='mean', axis=0) miss_mean_imputer = miss_mean_imputer.fit(df) imputed_df = miss_mean_imputer.transform(df.values) print(imputed_df) Output as a dataset is given below, we can see that all the nun values have been filled by the mean of the columns.

       C0      C1
0  0.2601  0.7154
1  0.2358     NaN
2  0.1429  0.2615
3  0.1259  0.5846
4  0.7526     NaN
5  0.7341  0.8308
6  0.4546  0.4962
7  0.1426     NaN
8  0.1490  0.5340
9  0.2500  0.6731

[[0.2601     0.7154    ]
 [0.2358     0.58508571]
 [0.1429     0.2615    ]
 [0.1259     0.5846    ]
 [0.7526     0.58508571]
 [0.7341     0.8308    ]
 [0.4546     0.4962    ]
 [0.1426     0.58508571]
 [0.149      0.534     ]
 [0.25       0.6731    ]]

Relevant Projects

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.