How to delete duplicates from a Pandas DataFrame?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to delete duplicates from a Pandas DataFrame?

How to delete duplicates from a Pandas DataFrame?

This recipe helps you delete duplicates from a Pandas DataFrame

0
This data science python source code does the following: 1. Creates your own data dictionary 2. Conversion of dictionary into dataframe 3. Identifying duplicates in a dataframe 4. Dropping the duplicates
In [1]:
## How to delete duplicates from a Pandas DataFrame
def Kickstarter_Example_84():
    print()
    print(format('How to delete duplicates from a Pandas DataFrame','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Create dataframe with duplicates
    raw_data = {'first_name': ['Jason', 'Jason', 'Jason','Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Miller', 'Miller','Ali', 'Milner', 'Cooze'],
                'age': [42, 42, 1111111, 36, 24, 73],
                'preTestScore': [4, 4, 4, 31, 2, 3],
                'postTestScore': [25, 25, 25, 57, 62, 70]}

    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)

    # Identify which observations are duplicates
    print(); print(df.duplicated())
    print(); print(df.drop_duplicates(keep='first'))

    # Drop duplicates in the first name column, but take the last obs in the duplicated set
    print(); print(df.drop_duplicates(['first_name'], keep='last'))

Kickstarter_Example_84()
*****************How to delete duplicates from a Pandas DataFrame*****************

  first_name last_name      age  preTestScore  postTestScore
0      Jason    Miller       42             4             25
1      Jason    Miller       42             4             25
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

0    False
1     True
2    False
3    False
4    False
5    False
dtype: bool

  first_name last_name      age  preTestScore  postTestScore
0      Jason    Miller       42             4             25
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

  first_name last_name      age  preTestScore  postTestScore
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

Relevant Projects

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.