How to utilise Pandas dataframe & series for data wrangling?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to utilise Pandas dataframe & series for data wrangling?

How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

0
This data science python source code does the following: 1. Creating and Pre-processing a series data. 2. Performs basic EDA and displays insights of the data. 3. Performs statistical analysis on the dataset.
In [1]:
## How to utilise Pandas dataframe & series for data wrangling
def Snippet_112():
    print()
    print(format('How to utilise a Pandas dataframe & series for data wrangling','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Series are one-dimensional arrays (like R’s vectors)
    # Create a series of the number of floodingReports
    floodingReports = pd.Series([5, 6, 2, 9, 12])
    print(); print(floodingReports)

    # Set county names to be the index of the floodingReports series
    floodingReports = pd.Series([5, 6, 2, 9, 12], index=['Cochise County', 'Pima County',
                                'Santa Cruz County', 'Maricopa County', 'Yuma County'])
    print(); print(floodingReports)

    # View the number of floodingReports in Cochise County
    print(); print(floodingReports['Cochise County'])

    # View the counties with more than 6 flooding reports
    print(); print(floodingReports[floodingReports > 6])


    # Create a pandas series from a dictionary
    fireReports_dict = {'Cochise County': 12, 'Pima County': 342,
                        'Santa Cruz County': 13, 'Maricopa County': 42,
                        'Yuma County' : 52}

    # Convert the dictionary into a pd.Series, and view it
    fireReports = pd.Series(fireReports_dict);
    print(); print(fireReports)

    # Change the index of a series to shorter names
    fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]


    # DataFrames are like R’s Dataframes
    # Create a dataframe from a dict of equal length lists or numpy arrays
    data = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'],
            'year': [2012, 2012, 2013, 2014, 2014],
            'reports': [4, 24, 31, 2, 3]}
    df = pd.DataFrame(data)
    print(); print(df)

    # Set the order of the columns using the columns attribute
    dfColumnOrdered = pd.DataFrame(data, columns=['county', 'year', 'reports'])
    print(); print(dfColumnOrdered)

    # Add a column
    dfColumnOrdered['newsCoverage'] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2])
    print(); print(dfColumnOrdered)

    # Delete a column
    del dfColumnOrdered['newsCoverage']
    print(); print(dfColumnOrdered)

    # Transpose the dataframe
    print(); print(dfColumnOrdered.T)

Snippet_112()
**********How to utilise a Pandas dataframe & series for data wrangling***********

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Relevant Projects

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.