How to segregate duplicate values from Pandas dataframe?

This recipe helps you segregate duplicate values from Pandas dataframe

Recipe Objective

Suppose we have duplicate data in our dataset. Now its best to segregate and remove them.

So this recipe is a short example on How to segregate duplicate values from Pandas dataframe. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for performing mathematical operation and preferably over arrays.

Step 2 - Setup the Data

df = pd.DataFrame({"A":[0, 1, 2, 3, 5, 9], "B":[11, 5, 8, 6, 7, 8], "C":[2, 5, 10, 11, 9, 8]})

Here we have setup a random dataset with some random values in it.

Step 3 - Segregating out duplicates

print(df['A']) print(set(df['A']))

Here we are our original column having duplicate values. Now using set function, we have simply segregated and dropped duplicate values.

Step 4 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to look at the results.

We can see the duplicate value 5 getting dropped out from final results. This operation will remain consistent even with strings.

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Build an Image Segmentation Model using Amazon SageMaker
In this Machine Learning Project, you will learn to implement the UNet Architecture and build an Image Segmentation Model using Amazon SageMaker

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

Learn to Build a Siamese Neural Network for Image Similarity
In this Deep Learning Project, you will learn how to build a siamese neural network with Keras and Tensorflow for Image Similarity.

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Learn to Build an End-to-End Machine Learning Pipeline - Part 1
In this Machine Learning Project, you will learn how to build an end-to-end machine learning pipeline for predicting truck delays, addressing a major challenge in the logistics industry.

Learn How to Build PyTorch Neural Networks from Scratch
In this deep learning project, you will learn how to build PyTorch neural networks from scratch.

Build Regression (Linear,Ridge,Lasso) Models in NumPy Python
In this machine learning regression project, you will learn to build NumPy Regression Models (Linear Regression, Ridge Regression, Lasso Regression) from Scratch.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

BigMart Sales Prediction ML Project in Python
The goal of the BigMart Sales Prediction ML project is to build and evaluate different predictive models and determine the sales of each product at a store.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT