How to group rows in a Pandas DataFrame?

This recipe helps you group rows in a Pandas DataFrame
Last Updated: 23 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

Before making a model we need to preprocess the data and for that we may need to make group of rows of data.

This data science python source code does the following:
1. Creates your own data dictionary.
2. Conversion of dictionary into dataframe.
3. Groups dataframe based on desired rows.

So this is the recipe on how we can group rows in a Pandas DataFrame.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Recipe Objective

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'regiment', 'company', 'name', 'Rating_Score' and 'Comedy_Score'. raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'Rating_Score': [4, 24, 31, 2, 3, 94, 57, 62, 70, 3, 2, 3], 'Comedy_Score': [25, 94, 57, 62, 70, 25, 24, 31, 2, 3, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'Rating_Score', 'Comedy_Score']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Grouping Rows

So we have created an object which will group rows on the basis of 'regiment' and compute statical scores on the basis of 'Rating_Score' regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

Mean of regiment_Rating_Score

print(regiment_Rating_Score.mean())

Sum of regiment_Rating_Score

print(regiment_Rating_Score.sum())

Maximum value of regiment_Rating_Score

print(regiment_Rating_Score.max())

Minimum value of regiment_Rating_Score

print(regiment_Rating_Score.min())

regiment_Rating_Score count

print(regiment_Rating_Score.count())

So the output comes as:

      regiment company      name  Rating_Score  Comedy_Score
0   Nighthawks     1st    Miller             4            25
1   Nighthawks     1st  Jacobson            24            94
2   Nighthawks     2nd       Ali            31            57
3   Nighthawks     2nd    Milner             2            62
4     Dragoons     1st     Cooze             3            70
5     Dragoons     1st     Jacon            94            25
6     Dragoons     2nd    Ryaner            57            24
7     Dragoons     2nd      Sone            62            31
8       Scouts     1st     Sloan            70             2
9       Scouts     1st     Piger             3             3
10      Scouts     2nd     Riani             2            62
11      Scouts     2nd       Ali             3            70

regiment
Dragoons      54.00
Nighthawks    15.25
Scouts        19.50
Name: Rating_Score, dtype: float64

regiment
Dragoons      216
Nighthawks     61
Scouts         78
Name: Rating_Score, dtype: int64

regiment
Dragoons      94
Nighthawks    31
Scouts        70
Name: Rating_Score, dtype: int64

regiment
Dragoons      3
Nighthawks    2
Scouts        2
Name: Rating_Score, dtype: int64

regiment
Dragoons      4
Nighthawks    4
Scouts        4
Name: Rating_Score, dtype: int64

Download Materials

iPython Notebook

What Users are saying..

Ameeruddin Mohammed

ETL (Abintio) developer at IBM

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More