How to find outliers in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to find outliers in Python?

How to find outliers in Python?

This recipe helps you find outliers in Python

0
In [2]:
## How to find outliers in Python 
def Kickstarter_Example_30():
    print()
    print(format('How to find outliers in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    from sklearn.covariance import EllipticEnvelope
    from sklearn.datasets import make_blobs
    import matplotlib.pyplot as plt

    # Create simulated data
    X, _ = make_blobs(n_samples = 100,
                      n_features = 20,
                      centers = 7,
                      cluster_std = 1.1,
                      shuffle = True,
                      random_state = 42)

    # Detect Outliers
    # Create detector
    outlier_detector = EllipticEnvelope(contamination=.1)

    # Fit detector
    outlier_detector.fit(X)

    # Predict outliers
    print(); print(X)
    print(); print(outlier_detector.predict(X))
    plt.scatter(X[:,0], X[:,1])

    # Show the scatterplot
    plt.show()

Kickstarter_Example_30()
**************************How to find outliers in Python**************************

[[ 4.93252797  7.68541287 -3.97876821 ...  4.52684633 -3.24863123
   9.41974416]
 [-9.3234536   4.59276437 -4.39779468 ... -7.09597087  8.20227193
   2.26134033]
 [-8.7338198   3.08658417 -3.49905765 ... -6.82385124  8.775862
   1.38825176]
 ...
 [-2.83969517 -6.07980264  6.47763993 ... -9.36607752 -2.57352093
  -9.39410402]
 [-2.1671993  10.63717797  5.58330442 ...  0.50898027 -1.25365592
  -5.02572796]
 [ 7.21074034  9.28156979 -3.54240715 ...  3.89782083 -3.2259812
  11.03335594]]

[-1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1 -1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1 -1  1  1  1  1  1
  1  1  1 -1]

Relevant Projects

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.