How to Find Outliers in Python? Method and Examples

This tutorial will help you understand the outlier detection process in Python, covered with step by step guidance and clear examples. | ProjectPro

In data analysis, outliers are data points that significantly deviate from the rest of the data. These anomalies can distort statistical analyses if not properly handled, leading to misleading interpretations. Therefore, detecting outliers is crucial for ensuring the integrity and accuracy of your analyses. This tutorial will explore various methods to detect outliers in Python and provide illustrative examples.

How to Find Outliers in Python? 

Several methods offer different approaches to identifying outliers in Python. Depending on the dataset and specific requirements, you may choose one or a combination of these techniques. Check them out below - 

Visual inspection involves plotting the data and identifying any points that appear to be outliers. Standard plots include histograms, box plots, and scatter plots. Seaborn and Matplotlib are popular libraries for creating such visualizations.

Outlier detection method in Python - Visualization Techniques

The Z-score method identifies outliers by calculating how many standard deviations a data point is from the mean. Typically, a threshold of 3 standard deviations is used to identify outliers.

Z-score method - outlier detection method

The IQR method defines outliers as data points that fall below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR, where Q1 and Q3 are the first and third quartiles, respectively, and IQR is the interquartile range. 

IQR Method - find outlier in Python

Isolation Forest is an unsupervised learning algorithm that isolates outliers by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Isolation forest method - calculate outlier in Python

Example to Check Outliers in Python 

Step 1 - Import the library

    from sklearn.covariance import EllipticEnvelope

    from sklearn.datasets import make_blobs

We have imported EllipticEnvelop and make_blobs, which are needed.

Step 2 - Setting up the Data

We have created a dataset using make_blobs, and we will remove outliers from it.

     X, _ = make_blobs(n_samples = 100,

                      n_features = 20,

                      centers = 7,

                      cluster_std = 1.1,

                      shuffle = True,

                      random_state = 42)

Step 3 - Removing Outliers

We are training the EllipticEnvelope with parameter contamination, which signifies how much data can be removed as outliers. We have predicted the output, which is the data without outliers.

    outlier_detector = EllipticEnvelope(contamination=.1)

    outlier_detector.fit(X)

    print(X)

    print(outlier_detector.predict(X))

So the output comes as

[[ 4.93252797  7.68541287 -3.97876821 ...  4.52684633 -3.24863123

   9.41974416]

 [-9.3234536   4.59276437 -4.39779468 ... -7.09597087  8.20227193

   2.26134033]

 [-8.7338198   3.08658417 -3.49905765 ... -6.82385124  8.775862

   1.38825176]

 ...

 [-2.83969517 -6.07980264  6.47763993 ... -9.36607752 -2.57352093

  -9.39410402]

 [-2.1671993  10.63717797  5.58330442 ...  0.50898027 -1.25365592

  -5.02572796]

 [ 7.21074034  9.28156979 -3.54240715 ...  3.89782083 -3.2259812

  11.03335594]]

 

[ 1 -1  1 -1  1  1 -1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1  1  1

  1  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1

  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1  1  1  1

  1  1 -1  1  1  1  1  1  1  1  1 -1  1  1  1  1  1 -1  1  1  1  1  1  1

  1  1  1  1]

 

Master Python Skills with ProjectPro! 

Mastering Python for data analysis involves understanding its syntax and libraries and gaining practical experience through real-world projects. Identifying outliers is a crucial aspect of data analysis, and Python offers various methods to accomplish this task efficiently. By applying the techniques discussed in this tutorial, you can effectively detect outliers in your datasets and make informed decisions based on reliable insights. However, theoretical knowledge alone is insufficient; hands-on experience is essential for honing your skills. That's where ProjectPro comes in. With its extensive repository of over 270+ projects in data science and big data, ProjectPro offers a unique opportunity to apply your Python skills in practical scenarios, further solidifying your understanding and proficiency. So, start your journey to mastering Python with ProjectPro.

Download Materials

What Users are saying..

profile image

Abhinav Agarwal

Graduate Student at Northwestern University
linkedin profile url

I come from Northwestern University, which is ranked 9th in the US. Although the high-quality academics at school taught me all the basics I needed, obtaining practical experience was a challenge.... Read More

Relevant Projects

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

NLP Project to Build a Resume Parser in Python using Spacy
Use the popular Spacy NLP python library for OCR and text classification to build a Resume Parser in Python.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Build OCR from Scratch Python using YOLO and Tesseract
In this deep learning project, you will learn how to build your custom OCR (optical character recognition) from scratch by using Google Tesseract and YOLO to read the text from any images.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.