How to segregate duplicate values from Pandas dataframe?

This recipe helps you segregate duplicate values from Pandas dataframe

Recipe Objective

Suppose we have duplicate data in our dataset. Now its best to segregate and remove them.

So this recipe is a short example on How to segregate duplicate values from Pandas dataframe. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for performing mathematical operation and preferably over arrays.

Step 2 - Setup the Data

df = pd.DataFrame({"A":[0, 1, 2, 3, 5, 9], "B":[11, 5, 8, 6, 7, 8], "C":[2, 5, 10, 11, 9, 8]})

Here we have setup a random dataset with some random values in it.

Step 3 - Segregating out duplicates

print(df['A']) print(set(df['A']))

Here we are our original column having duplicate values. Now using set function, we have simply segregated and dropped duplicate values.

Step 4 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to look at the results.

We can see the duplicate value 5 getting dropped out from final results. This operation will remain consistent even with strings.

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Credit Card Default Prediction using Machine learning techniques
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

MLOps Project to Deploy Resume Parser Model on Paperspace
In this MLOps project, you will learn how to deploy a Resume Parser Streamlit Application on Paperspace Private Cloud.

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.

NLP Project for Beginners on Text Processing and Classification
This Project Explains the Basic Text Preprocessing and How to Build a Classification Model in Python

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.