How to Find Outliers in Python? Method and Examples

This tutorial will help you understand the outlier detection process in Python, covered with step by step guidance and clear examples. | ProjectPro
Last Updated: 01 Apr 2024

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

In data analysis, outliers are data points that significantly deviate from the rest of the data. These anomalies can distort statistical analyses if not properly handled, leading to misleading interpretations. Therefore, detecting outliers is crucial for ensuring the integrity and accuracy of your analyses. This tutorial will explore various methods to detect outliers in Python and provide illustrative examples.

How to Find Outliers in Python?
- Visualization Techniques
- Z-Score Method
- Interquartile Range (IQR) Method
- Isolation Forest
Example to Check Outliers in Python
Master Python Skills with ProjectPro!

How to Find Outliers in Python?

Several methods offer different approaches to identifying outliers in Python. Depending on the dataset and specific requirements, you may choose one or a combination of these techniques. Check them out below -

Visualization Techniques

Visual inspection involves plotting the data and identifying any points that appear to be outliers. Standard plots include histograms, box plots, and scatter plots. Seaborn and Matplotlib are popular libraries for creating such visualizations.

Outlier detection method in Python - Visualization Techniques

Z-Score Method

The Z-score method identifies outliers by calculating how many standard deviations a data point is from the mean. Typically, a threshold of 3 standard deviations is used to identify outliers.

Interquartile Range (IQR) Method

The IQR method defines outliers as data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR, where Q1 and Q3 are the first and third quartiles, respectively, and IQR is the interquartile range.

IQR Method - find outlier in Python

Isolation Forest

Isolation Forest is an unsupervised learning algorithm that isolates outliers by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Isolation forest method - calculate outlier in Python

Example to Check Outliers in Python

Step 1 - Import the library

from sklearn.covariance import EllipticEnvelope

from sklearn.datasets import make_blobs

We have imported EllipticEnvelop and make_blobs, which are needed.

Step 2 - Setting up the Data

We have created a dataset using make_blobs, and we will remove outliers from it.

X, _ = make_blobs(n_samples = 100,

n_features = 20,

centers = 7,

cluster_std = 1.1,

shuffle = True,

random_state = 42)

Step 3 - Removing Outliers

We are training the EllipticEnvelope with parameter contamination, which signifies how much data can be removed as outliers. We have predicted the output, which is the data without outliers.

outlier_detector = EllipticEnvelope(contamination=.1)

outlier_detector.fit(X)

print(X)

print(outlier_detector.predict(X))

So the output comes as

[[ 4.93252797 7.68541287 -3.97876821 ... 4.52684633 -3.24863123

9.41974416]

[-9.3234536 4.59276437 -4.39779468 ... -7.09597087 8.20227193

2.26134033]

[-8.7338198 3.08658417 -3.49905765 ... -6.82385124 8.775862

1.38825176]

...

[-2.83969517 -6.07980264 6.47763993 ... -9.36607752 -2.57352093

-9.39410402]

[-2.1671993 10.63717797 5.58330442 ... 0.50898027 -1.25365592

-5.02572796]

[ 7.21074034 9.28156979 -3.54240715 ... 3.89782083 -3.2259812

11.03335594]]

[ 1 -1 1 -1 1 1 -1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 1

1 1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1

1 1 -1 1 1 1 1 1 1 1 1 -1 1 1 1 1 1 -1 1 1 1 1 1 1

1 1 1 1]

Master Python Skills with ProjectPro!

Mastering Python for data analysis involves understanding its syntax and libraries and gaining practical experience through real-world projects. Identifying outliers is a crucial aspect of data analysis, and Python offers various methods to accomplish this task efficiently. By applying the techniques discussed in this tutorial, you can effectively detect outliers in your datasets and make informed decisions based on reliable insights. However, theoretical knowledge alone is insufficient; hands-on experience is essential for honing your skills. That's where ProjectPro comes in. With its extensive repository of over 270+ projects in data science and big data, ProjectPro offers a unique opportunity to apply your Python skills in practical scenarios, further solidifying your understanding and proficiency. So, start your journey to mastering Python with ProjectPro.

Download Materials

iPython Notebook

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Multilabel Classification Project for Predicting Shipment Modes

Multilabel Classification Project to build a machine learning model that predicts the appropriate mode of transport for each shipment, using a transport dataset with 2000 unique products. The project explores and compares four different approaches to multilabel classification, including naive independent models, classifier chains, natively multilabel models, and multilabel to multiclass approaches.

View Project Details

AWS MLOps Project for Gaussian Process Time Series Modeling

MLOps Project to Build and Deploy a Gaussian Process Time Series Model in Python on AWS

View Project Details

Build a Text Generator Model using Amazon SageMaker

In this Deep Learning Project, you will train a Text Generator Model on Amazon Reviews Dataset using LSTM Algorithm in PyTorch and deploy it on Amazon SageMaker.

View Project Details

Deploy Transformer-BART Model on Paperspace Cloud

In this MLOps Project you will learn how to deploy a Tranaformer BART Model for Abstractive Text Summarization on Paperspace Private Cloud

View Project Details

Build CI/CD Pipeline for Machine Learning Projects using Jenkins

In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

View Project Details

Ecommerce product reviews - Pairwise ranking and sentiment analysis

This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

View Project Details

How to Find Outliers in Python? Method and Examples

Table of Contents

How to Find Outliers in Python?

Visualization Techniques

Z-Score Method

Interquartile Range (IQR) Method

Isolation Forest

Example to Check Outliers in Python

Step 1 - Import the library

Step 2 - Setting up the Data

Step 3 - Removing Outliers

Master Python Skills with ProjectPro!

Ed Godalle

Relevant Projects

You might also like

Relevant Projects