How to find outliers in Python?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to find outliers in Python?

How to find outliers in Python?

This recipe helps you find outliers in Python

0

Recipe Objective

Do you know few values in dataset are considered as outliers, there are the data values which donot comes in the range of data i.e. some values that is very small or large. They effect the model very badly so we need to remove the outlier.

So this is the recipe on we can find outliers in Python.

Step 1 - Import the library

from sklearn.covariance import EllipticEnvelope from sklearn.datasets import make_blobs

We have imported EllipticEnvelop and make_blobs which is needed.

Step 2 - Setting up the Data

We have created a dataset using make_blobs and we will remove outliers from this. X, _ = make_blobs(n_samples = 100, n_features = 20, centers = 7, cluster_std = 1.1, shuffle = True, random_state = 42)

Step 3 - Removing Outliers

We are training the EllipticEnvelope with parameter contamination which signifies the amount of data that is to be removed as outiers. We have predicted the output that is the data without outliers. outlier_detector = EllipticEnvelope(contamination=.1) outlier_detector.fit(X) print(X) print(outlier_detector.predict(X)) So the output comes as

[[ 4.93252797  7.68541287 -3.97876821 ...  4.52684633 -3.24863123
   9.41974416]
 [-9.3234536   4.59276437 -4.39779468 ... -7.09597087  8.20227193
   2.26134033]
 [-8.7338198   3.08658417 -3.49905765 ... -6.82385124  8.775862
   1.38825176]
 ...
 [-2.83969517 -6.07980264  6.47763993 ... -9.36607752 -2.57352093
  -9.39410402]
 [-2.1671993  10.63717797  5.58330442 ...  0.50898027 -1.25365592
  -5.02572796]
 [ 7.21074034  9.28156979 -3.54240715 ...  3.89782083 -3.2259812
  11.03335594]]

[ 1 -1  1 -1  1  1 -1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1
  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1
  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1
  1  1 -1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1 -1  1  1  1  1  1  1
  1  1  1  1]

Relevant Projects

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.