How to utilise Pandas dataframe & series for data wrangling?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to utilise Pandas dataframe & series for data wrangling?

How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

Recipe Objective

There are various data wrangling methods. Have you tried to use any of them for dataframe or series?

So this is the recipe on how we can utilise a Pandas dataframe & series for data wrangling.

Step 1 - Importing Library

import pandas as pd

We have only imported pandas which is needed.

Step 2 - Creating a series

We have created a series of numbers in the boject floodingReports and then added index to each number. floodingReports = pd.Series([5, 6, 2, 9, 12]) print(floodingReports) floodingReports = pd.Series([5, 6, 2, 9, 12], index=["Cochise County", "Pima County", "Santa Cruz County", "Maricopa County", "Yuma County"]) print(floodingReports)

Step 3 - Data Wrangling on series

First we have printed the number as per the index. Then we have printed the index on a condition that the value should be greater than 6. print(floodingReports["Cochise County"]) print(floodingReports[floodingReports > 6])

Step 4 - Creating a series from dictionary

We have created a series from a dictionary by passing the dictionary through pd.series. fireReports_dict = {"Cochise County": 12, "Pima County": 342, "Santa Cruz County": 13, "Maricopa County": 42, "Yuma County" : 52} fireReports = pd.Series(fireReports_dict) print(fireReports)

Step 5 - Changing the index of series

We can change the index of series by defining new set of index in series.index function. fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]

Step 6 - Creating a dataframe from dictionary

We have created a dataframe from a dictionary by passing the dictionary through pd.DataFrame data = {"county": ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"], "year": [2012, 2012, 2013, 2014, 2014], "reports": [4, 24, 31, 2, 3]} df = pd.DataFrame(data) print(df)

Step 7 - Performing Wrangling on dataframe

We are peroforming three Wrangling for better understanding.

  • Adding a new Column
  • dfColumnOrdered["newsCoverage"] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2]) print(dfColumnOrdered)
  • Deleting a column
  • del dfColumnOrdered["newsCoverage"] print(dfColumnOrdered)
  • Making Transpose
  • # Transpose the dataframe print(dfColumnOrdered.T)
So the output comes as:

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Download Materials

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Time Series Analysis Project in R on Stock Market forecasting
In this time series project, you will build a model to predict the stock prices and identify the best time series forecasting model that gives reliable and authentic results for decision making.