How to utilise Pandas dataframe & series for data wrangling?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to utilise Pandas dataframe & series for data wrangling?

How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

0

Recipe Objective

There are various data wrangling methods. Have you tried to use any of them for dataframe or series?

So this is the recipe on how we can utilise a Pandas dataframe & series for data wrangling.

Step 1 - Importing Library

import pandas as pd

We have only imported pandas which is needed.

Step 2 - Creating a series

We have created a series of numbers in the boject floodingReports and then added index to each number. floodingReports = pd.Series([5, 6, 2, 9, 12]) print(floodingReports) floodingReports = pd.Series([5, 6, 2, 9, 12], index=["Cochise County", "Pima County", "Santa Cruz County", "Maricopa County", "Yuma County"]) print(floodingReports)

Step 3 - Data Wrangling on series

First we have printed the number as per the index. Then we have printed the index on a condition that the value should be greater than 6. print(floodingReports["Cochise County"]) print(floodingReports[floodingReports > 6])

Step 4 - Creating a series from dictionary

We have created a series from a dictionary by passing the dictionary through pd.series. fireReports_dict = {"Cochise County": 12, "Pima County": 342, "Santa Cruz County": 13, "Maricopa County": 42, "Yuma County" : 52} fireReports = pd.Series(fireReports_dict) print(fireReports)

Step 5 - Changing the index of series

We can change the index of series by defining new set of index in series.index function. fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]

Step 6 - Creating a dataframe from dictionary

We have created a dataframe from a dictionary by passing the dictionary through pd.DataFrame data = {"county": ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"], "year": [2012, 2012, 2013, 2014, 2014], "reports": [4, 24, 31, 2, 3]} df = pd.DataFrame(data) print(df)

Step 7 - Performing Wrangling on dataframe

We are peroforming three Wrangling for better understanding.

  • Adding a new Column
  • dfColumnOrdered["newsCoverage"] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2]) print(dfColumnOrdered)
  • Deleting a column
  • del dfColumnOrdered["newsCoverage"] print(dfColumnOrdered)
  • Making Transpose
  • # Transpose the dataframe print(dfColumnOrdered.T)
So the output comes as:

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Relevant Projects

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.