How to delete duplicates from a Pandas DataFrame?
0

How to delete duplicates from a Pandas DataFrame?

This recipe helps you delete duplicates from a Pandas DataFrame

In [1]:
## How to delete duplicates from a Pandas DataFrame
def Kickstarter_Example_84():
    print()
    print(format('How to delete duplicates from a Pandas DataFrame','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Create dataframe with duplicates
    raw_data = {'first_name': ['Jason', 'Jason', 'Jason','Tina', 'Jake', 'Amy'],
                'last_name': ['Miller', 'Miller', 'Miller','Ali', 'Milner', 'Cooze'],
                'age': [42, 42, 1111111, 36, 24, 73],
                'preTestScore': [4, 4, 4, 31, 2, 3],
                'postTestScore': [25, 25, 25, 57, 62, 70]}

    df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)

    # Identify which observations are duplicates
    print(); print(df.duplicated())
    print(); print(df.drop_duplicates(keep='first'))

    # Drop duplicates in the first name column, but take the last obs in the duplicated set
    print(); print(df.drop_duplicates(['first_name'], keep='last'))

Kickstarter_Example_84()
*****************How to delete duplicates from a Pandas DataFrame*****************

  first_name last_name      age  preTestScore  postTestScore
0      Jason    Miller       42             4             25
1      Jason    Miller       42             4             25
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

0    False
1     True
2    False
3    False
4    False
5    False
dtype: bool

  first_name last_name      age  preTestScore  postTestScore
0      Jason    Miller       42             4             25
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

  first_name last_name      age  preTestScore  postTestScore
2      Jason    Miller  1111111             4             25
3       Tina       Ali       36            31             57
4       Jake    Milner       24             2             62
5        Amy     Cooze       73             3             70

Relevant Projects

Big Data Project Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.
Big Data Project Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.
Big Data Project Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.
Big Data Project Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.
Big Data Project Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.
Big Data Project Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.
Big Data Project Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.
Big Data Project Data Science Project-All State Insurance Claims Severity Prediction
Data science project in R to develop automated methods for predicting the cost and severity of insurance claims.
Big Data Project Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.
Big Data Project Anomaly Detection Using Deep Learning and Autoencoders
Deep Learning Project- Learn about implementation of a machine learning algorithm using autoencoders for anomaly detection.