Some times we find few missing values in various features in a dataset. Our model can not work efficiently on nun values and in few cases removing the rows having null values can not be considered as an option because it leads to loss of data of other features.
So this is the recipe on How we can impute missing values with means in Python
import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
We have imported pandas, numpy and Imputer from sklearn.preprocessing.
We have created a empty DataFrame first then made columns C0 and C1 with the values. Clearly we can see that in column C1 three elements are nun.
df = pd.DataFrame()
df['C0'] = [0.2601,0.2358,0.1429,0.1259,0.7526,
0.7341,0.4546,0.1426,0.1490,0.2500]
df['C1'] = [0.7154,np.nan,0.2615,0.5846,np.nan,
0.8308,0.4962,np.nan,0.5340,0.6731]
print(df)
We know that we have few nun values in column C1 so we have to fill it with the mean of remaining values of the column. So for this we will be using Imputer function, so let us first look into the parameters.
miss_mean_imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)
miss_mean_imputer = miss_mean_imputer.fit(df)
imputed_df = miss_mean_imputer.transform(df.values)
print(imputed_df)
Output as a dataset is given below, we can see that all the nun values have been filled by the mean of the columns.
C0 C1 0 0.2601 0.7154 1 0.2358 NaN 2 0.1429 0.2615 3 0.1259 0.5846 4 0.7526 NaN 5 0.7341 0.8308 6 0.4546 0.4962 7 0.1426 NaN 8 0.1490 0.5340 9 0.2500 0.6731 [[0.2601 0.7154 ] [0.2358 0.58508571] [0.1429 0.2615 ] [0.1259 0.5846 ] [0.7526 0.58508571] [0.7341 0.8308 ] [0.4546 0.4962 ] [0.1426 0.58508571] [0.149 0.534 ] [0.25 0.6731 ]]