How to present Hierarchical Data in Pandas?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to present Hierarchical Data in Pandas?

How to present Hierarchical Data in Pandas?

This recipe helps you present Hierarchical Data in Pandas

0
This data science python source code does the following: 1. Creates your own data dictionary. 2. Conversion of dictionary into dataframe. 3. Filtering data in hierarchial manner. 4. Swapping the index. 5. Performs statistical analysis.
In [1]:
## How to present Hierarchical Data in Pandas
def Kickstarter_Example_90():
    print()
    print(format('How to present Hierarchical Data in Pandas','*^82'))

    import warnings
    warnings.filterwarnings("ignore")

    # load libraries
    import pandas as pd

    # Create dataframe
    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks',
                             'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts',
                             'Scouts', 'Scouts', 'Scouts'],
                'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd',
                            '2nd','1st', '1st', '2nd', '2nd'],
                'name': ['Miller', 'Jacobson', 'Bali', 'Milner', 'Cooze', 'Jacon',
                         'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
                'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
                'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name',
                                           'preTestScore', 'postTestScore'])
    print(); print(df)

    # Set the hierarchical index but leave the columns inplace
    df.set_index(['regiment', 'company'], drop=False)
    print(); print(df)

    # Set the hierarchical index to be by regiment, and then by company
    df = df.set_index(['regiment', 'company'])
    print(); print(df)

    # View the index
    print(); print(df.index)

    # Swap the levels in the index
    print(); print(df.swaplevel('regiment', 'company'))

    # Summarize the results by regiment
    print(); print(df.sum(level='regiment'))
    print(); print(df.count(level='regiment'))
    print(); print(df.mean(level='regiment'))
    print(); print(df.max(level='regiment'))
    print(); print(df.min(level='regiment'))

Kickstarter_Example_90()
********************How to present Hierarchical Data in Pandas********************

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd      Bali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd      Bali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

                        name  preTestScore  postTestScore
regiment   company
Nighthawks 1st        Miller             4             25
           1st      Jacobson            24             94
           2nd          Bali            31             57
           2nd        Milner             2             62
Dragoons   1st         Cooze             3             70
           1st         Jacon             4             25
           2nd        Ryaner            24             94
           2nd          Sone            31             57
Scouts     1st         Sloan             2             62
           1st         Piger             3             70
           2nd         Riani             2             62
           2nd           Ali             3             70

MultiIndex(levels=[['Dragoons', 'Nighthawks', 'Scouts'], ['1st', '2nd']],
           labels=[[1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]],
           names=['regiment', 'company'])

                        name  preTestScore  postTestScore
company regiment
1st     Nighthawks    Miller             4             25
        Nighthawks  Jacobson            24             94
2nd     Nighthawks      Bali            31             57
        Nighthawks    Milner             2             62
1st     Dragoons       Cooze             3             70
        Dragoons       Jacon             4             25
2nd     Dragoons      Ryaner            24             94
        Dragoons        Sone            31             57
1st     Scouts         Sloan             2             62
        Scouts         Piger             3             70
2nd     Scouts         Riani             2             62
        Scouts           Ali             3             70

            preTestScore  postTestScore
regiment
Nighthawks            61            238
Dragoons              62            246
Scouts                10            264

            name  preTestScore  postTestScore
regiment
Dragoons       4             4              4
Nighthawks     4             4              4
Scouts         4             4              4

            preTestScore  postTestScore
regiment
Nighthawks         15.25           59.5
Dragoons           15.50           61.5
Scouts              2.50           66.0

              name  preTestScore  postTestScore
regiment
Nighthawks  Milner            31             94
Dragoons      Sone            31             94
Scouts       Sloan             3             70

             name  preTestScore  postTestScore
regiment
Nighthawks   Bali             2             25
Dragoons    Cooze             3             25
Scouts        Ali             2             62

Relevant Projects

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.