How to present Hierarchical Data in Pandas?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to present Hierarchical Data in Pandas?

How to present Hierarchical Data in Pandas?

This recipe helps you present Hierarchical Data in Pandas

0

Recipe Objective

Have you ever tried to present the data such that its index is set as per a perticular level. Such that many feature is set as index and we can to set the hierarchy in features.

So this is the recipe on how we can present Hierarchical Data in Pandas.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with features as "regiment", "company", "Rating_Score" and "Public_Score". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "Rating_Score": [4, 24, 94, 25, 4, 24, 24, 31, 2, 3, 2, 3], "Public_Score": [25, 94, 31, 2, 70, 25, 4, 24, 31, 2, 3, 4]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "Rating_Score", "Public_Score"]) print(); print(df)

Step 3 - Setting up the index

Here while setting index we are setting it hierarchically as first index as regiment and then company. We have printed the index and for better understanding we have swapped the index which changes the hierarchy df = df.set_index(["regiment", "company"]) print(df) print(df.index) print(df.swaplevel("regiment", "company"))

Step 4 - Summarizing the results

Here we will be using different methods of stats to summerize the data.

  • Finding Sum with respect to regiment
  • print(df.sum(level="regiment"))
  • Counting with respect to regiment
  • print(df.count(level="regiment"))
  • Calculating mean with respect to regiment
  • print(df.mean(level="regiment"))
  • Maximum value with respect to regiment
  • print(df.max(level="regiment"))
  • Manimum value with respect to regiment
  • print(df.min(level="regiment"))
So the output comes as:

     regiment company  Rating_Score  Public_Score
0   Nighthawks     1st             4            25
1   Nighthawks     1st            24            94
2   Nighthawks     2nd            94            31
3   Nighthawks     2nd            25             2
4     Dragoons     1st             4            70
5     Dragoons     1st            24            25
6     Dragoons     2nd            24             4
7     Dragoons     2nd            31            24
8       Scouts     1st             2            31
9       Scouts     1st             3             2
10      Scouts     2nd             2             3
11      Scouts     2nd             3             4

                    Rating_Score  Public_Score
regiment   company                            
Nighthawks 1st                 4            25
           1st                24            94
           2nd                94            31
           2nd                25             2
Dragoons   1st                 4            70
           1st                24            25
           2nd                24             4
           2nd                31            24
Scouts     1st                 2            31
           1st                 3             2
           2nd                 2             3
           2nd                 3             4

MultiIndex(levels=[["Dragoons", "Nighthawks", "Scouts"], ["1st", "2nd"]],
           labels=[[1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]],
           names=["regiment", "company"])

                    Rating_Score  Public_Score
company regiment                              
1st     Nighthawks             4            25
        Nighthawks            24            94
2nd     Nighthawks            94            31
        Nighthawks            25             2
1st     Dragoons               4            70
        Dragoons              24            25
2nd     Dragoons              24             4
        Dragoons              31            24
1st     Scouts                 2            31
        Scouts                 3             2
2nd     Scouts                 2             3
        Scouts                 3             4

            Rating_Score  Public_Score
regiment                              
Nighthawks           147           152
Dragoons              83           123
Scouts                10            40

            Rating_Score  Public_Score
regiment                              
Dragoons               4             4
Nighthawks             4             4
Scouts                 4             4

            Rating_Score  Public_Score
regiment                              
Nighthawks         36.75         38.00
Dragoons           20.75         30.75
Scouts              2.50         10.00

            Rating_Score  Public_Score
regiment                              
Nighthawks            94            94
Dragoons              31            70
Scouts                 3            31

            Rating_Score  Public_Score
regiment                              
Nighthawks             4             2
Dragoons               4             4
Scouts                 2             2

Relevant Projects

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.