How to utilise Pandas dataframe & series for data wrangling?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to utilise Pandas dataframe & series for data wrangling?

How to utilise Pandas dataframe & series for data wrangling?

This recipe helps you utilise Pandas dataframe & series for data wrangling

0
This data science python source code does the following: 1. Creating and Pre-processing a series data. 2. Performs basic EDA and displays insights of the data. 3. Performs statistical analysis on the dataset.
In [1]:
## How to utilise Pandas dataframe & series for data wrangling
def Snippet_112():
    print()
    print(format('How to utilise a Pandas dataframe & series for data wrangling','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Series are one-dimensional arrays (like R’s vectors)
    # Create a series of the number of floodingReports
    floodingReports = pd.Series([5, 6, 2, 9, 12])
    print(); print(floodingReports)

    # Set county names to be the index of the floodingReports series
    floodingReports = pd.Series([5, 6, 2, 9, 12], index=['Cochise County', 'Pima County',
                                'Santa Cruz County', 'Maricopa County', 'Yuma County'])
    print(); print(floodingReports)

    # View the number of floodingReports in Cochise County
    print(); print(floodingReports['Cochise County'])

    # View the counties with more than 6 flooding reports
    print(); print(floodingReports[floodingReports > 6])


    # Create a pandas series from a dictionary
    fireReports_dict = {'Cochise County': 12, 'Pima County': 342,
                        'Santa Cruz County': 13, 'Maricopa County': 42,
                        'Yuma County' : 52}

    # Convert the dictionary into a pd.Series, and view it
    fireReports = pd.Series(fireReports_dict);
    print(); print(fireReports)

    # Change the index of a series to shorter names
    fireReports.index = ["Cochice", "Pima", "Santa Cruz", "Maricopa", "Yuma"]


    # DataFrames are like R’s Dataframes
    # Create a dataframe from a dict of equal length lists or numpy arrays
    data = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'],
            'year': [2012, 2012, 2013, 2014, 2014],
            'reports': [4, 24, 31, 2, 3]}
    df = pd.DataFrame(data)
    print(); print(df)

    # Set the order of the columns using the columns attribute
    dfColumnOrdered = pd.DataFrame(data, columns=['county', 'year', 'reports'])
    print(); print(dfColumnOrdered)

    # Add a column
    dfColumnOrdered['newsCoverage'] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2])
    print(); print(dfColumnOrdered)

    # Delete a column
    del dfColumnOrdered['newsCoverage']
    print(); print(dfColumnOrdered)

    # Transpose the dataframe
    print(); print(dfColumnOrdered.T)

Snippet_112()
**********How to utilise a Pandas dataframe & series for data wrangling***********

0     5
1     6
2     2
3     9
4    12
dtype: int64

Cochise County        5
Pima County           6
Santa Cruz County     2
Maricopa County       9
Yuma County          12
dtype: int64

5

Maricopa County     9
Yuma County        12
dtype: int64

Cochise County        12
Pima County          342
Santa Cruz County     13
Maricopa County       42
Yuma County           52
dtype: int64

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

       county  year  reports  newsCoverage
0     Cochice  2012        4          42.3
1        Pima  2012       24          92.1
2  Santa Cruz  2013       31          12.2
3    Maricopa  2014        2          39.3
4        Yuma  2014        3          30.2

       county  year  reports
0     Cochice  2012        4
1        Pima  2012       24
2  Santa Cruz  2013       31
3    Maricopa  2014        2
4        Yuma  2014        3

               0     1           2         3     4
county   Cochice  Pima  Santa Cruz  Maricopa  Yuma
year        2012  2012        2013      2014  2014
reports        4    24          31         2     3

Relevant Projects

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Loan Eligibility Prediction using Gradient Boosting Classifier
This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Human Activity Recognition Using Smartphones Data Set
In this deep learning project, you will build a classification system where to precisely identify human fitness activities.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.