How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

Recipe Objective

There are various data wrangling methods. Have you tried to use any of them for dataframe or series?

So this is the recipe on how we can utilise a Pandas dataframe & series for data wrangling.

Step 1 - Importing Library

import pandas as pd

We have only imported pandas which is needed.

Step 2 - Creating a series

We have created a series of numbers in the boject floodingReports and then added index to each number. floodingReports = pd.Series([5, 6, 2, 9, 12]) print(floodingReports) floodingReports = pd.Series([5, 6, 2, 9, 12], index=["Cochise County", "Pima County", "Santa Cruz County", "Maricopa County", "Yuma County"]) print(floodingReports)

Step 3 - Data Wrangling on series

First we have printed the number as per the index. Then we have printed the index on a condition that the value should be greater than 6. print(floodingReports["Cochise County"]) print(floodingReports[floodingReports > 6])

Step 4 - Creating a series from dictionary

We have created a series from a dictionary by passing the dictionary through pd.series. fireReports_dict = {"Cochise County": 12, "Pima County": 342, "Santa Cruz County": 13, "Maricopa County": 42, "Yuma County" : 52} fireReports = pd.Series(fireReports_dict) print(fireReports)

Step 5 - Changing the index of series

We can change the index of series by defining new set of index in series.index function. fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]

Step 6 - Creating a dataframe from dictionary

We have created a dataframe from a dictionary by passing the dictionary through pd.DataFrame data = {"county": ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"], "year": [2012, 2012, 2013, 2014, 2014], "reports": [4, 24, 31, 2, 3]} df = pd.DataFrame(data) print(df)

Step 7 - Performing Wrangling on dataframe

We are peroforming three Wrangling for better understanding.

  • Adding a new Column
  • dfColumnOrdered["newsCoverage"] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2]) print(dfColumnOrdered)
  • Deleting a column
  • del dfColumnOrdered["newsCoverage"] print(dfColumnOrdered)
  • Making Transpose
  • # Transpose the dataframe print(dfColumnOrdered.T)
So the output comes as:

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Download Materials

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build a Multi Class Image Classification Model Python using CNN
This project explains How to build a Sequential Model that can perform Multi Class Image Classification in Python using CNN

Loan Eligibility Prediction Project using Machine learning on GCP
Loan Eligibility Prediction Project - Use SQL and Python to build a predictive model on GCP to determine whether an application requesting loan is eligible or not.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

PyCaret Project to Build and Deploy an ML App using Streamlit
In this PyCaret Project, you will build a customer segmentation model with PyCaret and deploy the machine learning application using Streamlit.

Build a Multi ClassText Classification Model using Naive Bayes
Implement the Naive Bayes Algorithm to build a multi class text classification model in Python.

Build Multi Class Text Classification Models with RNN and LSTM
In this Deep Learning Project, you will use the customer complaints data about consumer financial products to build multi-class text classification models using RNN and LSTM.

MLOps Project to Build Search Relevancy Algorithm with SBERT
In this MLOps SBERT project you will learn to build and deploy an accurate and scalable search algorithm on AWS using SBERT and ANNOY to enhance search relevancy in news articles.

BERT Text Classification using DistilBERT and ALBERT Models
This Project Explains how to perform Text Classification using ALBERT and DistilBERT

Locality Sensitive Hashing Python Code for Look-Alike Modelling
In this deep learning project, you will find similar images (lookalikes) using deep learning and locality sensitive hashing to find customers who are most likely to click on an ad.