How to utilise Pandas dataframe & series for data wrangling?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to utilise Pandas dataframe & series for data wrangling?

How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

0

Recipe Objective

There are various data wrangling methods. Have you tried to use any of them for dataframe or series?

So this is the recipe on how we can utilise a Pandas dataframe & series for data wrangling.

Step 1 - Importing Library

import pandas as pd

We have only imported pandas which is needed.

Step 2 - Creating a series

We have created a series of numbers in the boject floodingReports and then added index to each number. floodingReports = pd.Series([5, 6, 2, 9, 12]) print(floodingReports) floodingReports = pd.Series([5, 6, 2, 9, 12], index=["Cochise County", "Pima County", "Santa Cruz County", "Maricopa County", "Yuma County"]) print(floodingReports)

Step 3 - Data Wrangling on series

First we have printed the number as per the index. Then we have printed the index on a condition that the value should be greater than 6. print(floodingReports["Cochise County"]) print(floodingReports[floodingReports > 6])

Step 4 - Creating a series from dictionary

We have created a series from a dictionary by passing the dictionary through pd.series. fireReports_dict = {"Cochise County": 12, "Pima County": 342, "Santa Cruz County": 13, "Maricopa County": 42, "Yuma County" : 52} fireReports = pd.Series(fireReports_dict) print(fireReports)

Step 5 - Changing the index of series

We can change the index of series by defining new set of index in series.index function. fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]

Step 6 - Creating a dataframe from dictionary

We have created a dataframe from a dictionary by passing the dictionary through pd.DataFrame data = {"county": ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"], "year": [2012, 2012, 2013, 2014, 2014], "reports": [4, 24, 31, 2, 3]} df = pd.DataFrame(data) print(df)

Step 7 - Performing Wrangling on dataframe

We are peroforming three Wrangling for better understanding.

  • Adding a new Column
  • dfColumnOrdered["newsCoverage"] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2]) print(dfColumnOrdered)
  • Deleting a column
  • del dfColumnOrdered["newsCoverage"] print(dfColumnOrdered)
  • Making Transpose
  • # Transpose the dataframe print(dfColumnOrdered.T)
So the output comes as:

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.