How to present Hierarchical Data in Pandas?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to present Hierarchical Data in Pandas?

How to present Hierarchical Data in Pandas?

This recipe helps you present Hierarchical Data in Pandas

0

Recipe Objective

Have you ever tried to present the data such that its index is set as per a perticular level. Such that many feature is set as index and we can to set the hierarchy in features.

So this is the recipe on how we can present Hierarchical Data in Pandas.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with features as "regiment", "company", "Rating_Score" and "Public_Score". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "Rating_Score": [4, 24, 94, 25, 4, 24, 24, 31, 2, 3, 2, 3], "Public_Score": [25, 94, 31, 2, 70, 25, 4, 24, 31, 2, 3, 4]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "Rating_Score", "Public_Score"]) print(); print(df)

Step 3 - Setting up the index

Here while setting index we are setting it hierarchically as first index as regiment and then company. We have printed the index and for better understanding we have swapped the index which changes the hierarchy df = df.set_index(["regiment", "company"]) print(df) print(df.index) print(df.swaplevel("regiment", "company"))

Step 4 - Summarizing the results

Here we will be using different methods of stats to summerize the data.

  • Finding Sum with respect to regiment
  • print(df.sum(level="regiment"))
  • Counting with respect to regiment
  • print(df.count(level="regiment"))
  • Calculating mean with respect to regiment
  • print(df.mean(level="regiment"))
  • Maximum value with respect to regiment
  • print(df.max(level="regiment"))
  • Manimum value with respect to regiment
  • print(df.min(level="regiment"))
So the output comes as:

     regiment company  Rating_Score  Public_Score
0   Nighthawks     1st             4            25
1   Nighthawks     1st            24            94
2   Nighthawks     2nd            94            31
3   Nighthawks     2nd            25             2
4     Dragoons     1st             4            70
5     Dragoons     1st            24            25
6     Dragoons     2nd            24             4
7     Dragoons     2nd            31            24
8       Scouts     1st             2            31
9       Scouts     1st             3             2
10      Scouts     2nd             2             3
11      Scouts     2nd             3             4

                    Rating_Score  Public_Score
regiment   company                            
Nighthawks 1st                 4            25
           1st                24            94
           2nd                94            31
           2nd                25             2
Dragoons   1st                 4            70
           1st                24            25
           2nd                24             4
           2nd                31            24
Scouts     1st                 2            31
           1st                 3             2
           2nd                 2             3
           2nd                 3             4

MultiIndex(levels=[["Dragoons", "Nighthawks", "Scouts"], ["1st", "2nd"]],
           labels=[[1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]],
           names=["regiment", "company"])

                    Rating_Score  Public_Score
company regiment                              
1st     Nighthawks             4            25
        Nighthawks            24            94
2nd     Nighthawks            94            31
        Nighthawks            25             2
1st     Dragoons               4            70
        Dragoons              24            25
2nd     Dragoons              24             4
        Dragoons              31            24
1st     Scouts                 2            31
        Scouts                 3             2
2nd     Scouts                 2             3
        Scouts                 3             4

            Rating_Score  Public_Score
regiment                              
Nighthawks           147           152
Dragoons              83           123
Scouts                10            40

            Rating_Score  Public_Score
regiment                              
Dragoons               4             4
Nighthawks             4             4
Scouts                 4             4

            Rating_Score  Public_Score
regiment                              
Nighthawks         36.75         38.00
Dragoons           20.75         30.75
Scouts              2.50         10.00

            Rating_Score  Public_Score
regiment                              
Nighthawks            94            94
Dragoons              31            70
Scouts                 3            31

            Rating_Score  Public_Score
regiment                              
Nighthawks             4             2
Dragoons               4             4
Scouts                 2             2

Relevant Projects

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Choosing the right Time Series Forecasting Methods
There are different time series forecasting methods to forecast stock price, demand etc. In this machine learning project, you will learn to determine which forecasting method to be used when and how to apply with time series forecasting example.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.