How to deal with outliers in Python?
DATA MUNGING

How to deal with outliers in Python?

How to deal with outliers in Python?

This recipe helps you deal with outliers in Python

0
This data science python source code does the following: 1. Imports pandas and numpy libraries. 2. Creates your own dataframe using pandas. 3.Outliers handling by dropping them. 4. Outliers handling using boolean marking. 5. Outliers handling using Rescalinf of features.
In [2]:
## How to deal with outliers in Python 
def Kickstarter_Example_34():
    print()
    print(format('How to deal with outliers in Python ', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load library
    import numpy as np
    import pandas as pd

    # Create DataFrame
    houses = pd.DataFrame()
    houses['Price'] = [534433, 392333, 293222, 4322032]
    houses['Bathrooms'] = [2, 3.5, 2, 116]
    houses['Square_Feet'] = [1500, 2500, 1500, 48000]
    print(); print(houses)

    # Outlier Handling Option 1: Drop
    # Drop observations greater than some value
    h = houses[houses['Bathrooms'] < 20]
    print(); print(h)

    # Outlier Handling Option 2: Mark
    # Create feature based on boolean condition
    houses['Outlier'] = np.where(houses['Bathrooms'] < 20, 0, 1)

    # Show data
    print(); print(houses)

    # Outlier Handling Option 3: Rescale
    # Log feature
    houses['Log_Of_Square_Feet'] = [np.log(x) for x in houses['Square_Feet']]

    # Show data
    print(); print(houses)

Kickstarter_Example_34()
***********************How to deal with outliers in Python ***********************

     Price  Bathrooms  Square_Feet
0   534433        2.0         1500
1   392333        3.5         2500
2   293222        2.0         1500
3  4322032      116.0        48000

    Price  Bathrooms  Square_Feet
0  534433        2.0         1500
1  392333        3.5         2500
2  293222        2.0         1500

     Price  Bathrooms  Square_Feet  Outlier
0   534433        2.0         1500        0
1   392333        3.5         2500        0
2   293222        2.0         1500        0
3  4322032      116.0        48000        1

     Price  Bathrooms  Square_Feet  Outlier  Log_Of_Square_Feet
0   534433        2.0         1500        0            7.313220
1   392333        3.5         2500        0            7.824046
2   293222        2.0         1500        0            7.313220
3  4322032      116.0        48000        1           10.778956

Relevant Projects

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.