How to segregate duplicate values from Pandas dataframe?

How to segregate duplicate values from Pandas dataframe?

How to segregate duplicate values from Pandas dataframe?

This recipe helps you segregate duplicate values from Pandas dataframe


Recipe Objective

Suppose we have duplicate data in our dataset. Now its best to segregate and remove them.

So this recipe is a short example on How to segregate duplicate values from Pandas dataframe. Let's get started.

Step 1 - Import the library

import pandas as pd

Let's pause and look at these imports. Pandas is generally used for performing mathematical operation and preferably over arrays.

Step 2 - Setup the Data

df = pd.DataFrame({"A":[0, 1, 2, 3, 5, 9], "B":[11, 5, 8, 6, 7, 8], "C":[2, 5, 10, 11, 9, 8]})

Here we have setup a random dataset with some random values in it.

Step 3 - Segregating out duplicates

print(df['A']) print(set(df['A']))

Here we are our original column having duplicate values. Now using set function, we have simply segregated and dropped duplicate values.

Step 4 - Let's look at our dataset now

Once we run the above code snippet, we will see:

Scroll down to the ipython file to look at the results.

We can see the duplicate value 5 getting dropped out from final results. This operation will remain consistent even with strings.

Relevant Projects

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.