How to find outliers in Python?
DATA MUNGING

How to find outliers in Python?

How to find outliers in Python?

This recipe helps you find outliers in Python

0
In [2]:
## How to find outliers in Python 
def Kickstarter_Example_30():
    print()
    print(format('How to find outliers in Python', '*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # Load libraries
    from sklearn.covariance import EllipticEnvelope
    from sklearn.datasets import make_blobs
    import matplotlib.pyplot as plt

    # Create simulated data
    X, _ = make_blobs(n_samples = 100,
                      n_features = 20,
                      centers = 7,
                      cluster_std = 1.1,
                      shuffle = True,
                      random_state = 42)

    # Detect Outliers
    # Create detector
    outlier_detector = EllipticEnvelope(contamination=.1)

    # Fit detector
    outlier_detector.fit(X)

    # Predict outliers
    print(); print(X)
    print(); print(outlier_detector.predict(X))
    plt.scatter(X[:,0], X[:,1])

    # Show the scatterplot
    plt.show()

Kickstarter_Example_30()
**************************How to find outliers in Python**************************

[[ 4.93252797  7.68541287 -3.97876821 ...  4.52684633 -3.24863123
   9.41974416]
 [-9.3234536   4.59276437 -4.39779468 ... -7.09597087  8.20227193
   2.26134033]
 [-8.7338198   3.08658417 -3.49905765 ... -6.82385124  8.775862
   1.38825176]
 ...
 [-2.83969517 -6.07980264  6.47763993 ... -9.36607752 -2.57352093
  -9.39410402]
 [-2.1671993  10.63717797  5.58330442 ...  0.50898027 -1.25365592
  -5.02572796]
 [ 7.21074034  9.28156979 -3.54240715 ...  3.89782083 -3.2259812
  11.03335594]]

[-1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1 -1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1 -1 -1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1 -1  1  1  1  1  1
  1  1  1 -1]

Relevant Projects

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Anomaly Detection Using Deep Learning and Autoencoders
Deep Learning Project- Learn about implementation of a machine learning algorithm using autoencoders for anomaly detection.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.