How to segregate duplicate values from Pandas dataframe?

This recipe helps you segregate duplicate values from Pandas dataframe
Last Updated: 25 Jan 2021

Get access to Data Science projects View all Data Science projects

MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Suppose we have duplicate data in our dataset. Now its best to segregate and remove them.

So this recipe is a short example on How to segregate duplicate values from Pandas dataframe. Let's get started.

Step 1 - Import the library


import pandas as pd

Let's pause and look at these imports. Pandas is generally used for performing mathematical operation and preferably over arrays.

Step 2 - Setup the Data



df = pd.DataFrame({"A":[0, 1, 2, 3, 5, 9],  
                   "B":[11, 5, 8, 6, 7, 8], 
                   "C":[2, 5, 10, 11, 9, 8]})

Here we have setup a random dataset with some random values in it.

Step 3 - Segregating out duplicates


print(df['A'])
print(set(df['A']))

Here we are our original column having duplicate values. Now using set function, we have simply segregated and dropped duplicate values.

Step 4 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to look at the results.

We can see the duplicate value 5 getting dropped out from final results. This operation will remain consistent even with strings.

What Users are saying..

Ed Godalle

Director Data Analytics at EY / EY Tech

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples

In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

View Project Details

Loan Eligibility Prediction Project using Machine learning on GCP

Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

View Project Details

Credit Card Default Prediction using Machine learning techniques

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

View Project Details

MLOps Project to Deploy Resume Parser Model on Paperspace

In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

View Project Details

Isolation Forest Model and LOF for Anomaly Detection in Python

Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

View Project Details

Learn Hyperparameter Tuning for Neural Networks with PyTorch

In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

View Project Details

Build CNN Image Classification Models for Real Time Prediction

Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

View Project Details

NLP Project for Beginners on Text Processing and Classification

This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

View Project Details

Mastering A/B Testing: A Practical Guide for Production

In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

View Project Details

Credit Card Fraud Detection as a Classification Problem

In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

View Project Details

How to segregate duplicate values from Pandas dataframe?

Recipe Objective

Step 1 - Import the library

Step 2 - Setup the Data

Step 3 - Segregating out duplicates

Step 4 - Let's look at our dataset now

Ed Godalle

Relevant Projects

You might also like

Relevant Projects