Recipe: How to deal with outliers in Python?
DATA MUNGING DATA CLEANING PYTHON

How to deal with outliers in Python?

This recipe helps you deal with outliers in Python
In [2]:
## How to deal with outliers in Python 
def Kickstarter_Example_34():
    print()
    print(format('How to deal with outliers in Python ', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load library
    import numpy as np
    import pandas as pd

    # Create DataFrame
    houses = pd.DataFrame()
    houses['Price'] = [534433, 392333, 293222, 4322032]
    houses['Bathrooms'] = [2, 3.5, 2, 116]
    houses['Square_Feet'] = [1500, 2500, 1500, 48000]
    print(); print(houses)

    # Outlier Handling Option 1: Drop
    # Drop observations greater than some value
    h = houses[houses['Bathrooms'] < 20]
    print(); print(h)

    # Outlier Handling Option 2: Mark
    # Create feature based on boolean condition
    houses['Outlier'] = np.where(houses['Bathrooms'] < 20, 0, 1)

    # Show data
    print(); print(houses)

    # Outlier Handling Option 3: Rescale
    # Log feature
    houses['Log_Of_Square_Feet'] = [np.log(x) for x in houses['Square_Feet']]

    # Show data
    print(); print(houses)

Kickstarter_Example_34()
***********************How to deal with outliers in Python ***********************

     Price  Bathrooms  Square_Feet
0   534433        2.0         1500
1   392333        3.5         2500
2   293222        2.0         1500
3  4322032      116.0        48000

    Price  Bathrooms  Square_Feet
0  534433        2.0         1500
1  392333        3.5         2500
2  293222        2.0         1500

     Price  Bathrooms  Square_Feet  Outlier
0   534433        2.0         1500        0
1   392333        3.5         2500        0
2   293222        2.0         1500        0
3  4322032      116.0        48000        1

     Price  Bathrooms  Square_Feet  Outlier  Log_Of_Square_Feet
0   534433        2.0         1500        0            7.313220
1   392333        3.5         2500        0            7.824046
2   293222        2.0         1500        0            7.313220
3  4322032      116.0        48000        1           10.778956


Stuck at work?
Can't find the recipe you are looking for. Let us know and we will find an expert to create the recipe for you. Click here
Companies using this Recipe
1 developer from EXL Service
1 developer from KPMG
1 developer from Scotiabank
1 developer from YASH Technologies
1 developer from Altimetrik
1 developer from FIGmd
1 developer from LTI
1 developer from Tata Consultancy Services
1 developer from ANAC
1 developer from HvH