How to group rows in a Pandas DataFrame?

This recipe helps you group rows in a Pandas DataFrame

Recipe Objective

Before making a model we need to preprocess the data and for that we may need to make group of rows of data.

This data science python source code does the following:
1. Creates your own data dictionary.
2. Conversion of dictionary into dataframe.
3. Groups dataframe based on desired rows.

So this is the recipe on how we can group rows in a Pandas DataFrame.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'regiment', 'company', 'name', 'Rating_Score' and 'Comedy_Score'. raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'Rating_Score': [4, 24, 31, 2, 3, 94, 57, 62, 70, 3, 2, 3], 'Comedy_Score': [25, 94, 57, 62, 70, 25, 24, 31, 2, 3, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'Rating_Score', 'Comedy_Score']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Grouping Rows

So we have created an object which will group rows on the basis of 'regiment' and compute statical scores on the basis of 'Rating_Score' regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

    • Mean of regiment_Rating_Score

print(regiment_Rating_Score.mean())

    • Sum of regiment_Rating_Score

print(regiment_Rating_Score.sum())

    • Maximum value of regiment_Rating_Score

print(regiment_Rating_Score.max())

    • Minimum value of regiment_Rating_Score

print(regiment_Rating_Score.min())

    • regiment_Rating_Score count

print(regiment_Rating_Score.count())

So the output comes as:

      regiment company      name  Rating_Score  Comedy_Score
0   Nighthawks     1st    Miller             4            25
1   Nighthawks     1st  Jacobson            24            94
2   Nighthawks     2nd       Ali            31            57
3   Nighthawks     2nd    Milner             2            62
4     Dragoons     1st     Cooze             3            70
5     Dragoons     1st     Jacon            94            25
6     Dragoons     2nd    Ryaner            57            24
7     Dragoons     2nd      Sone            62            31
8       Scouts     1st     Sloan            70             2
9       Scouts     1st     Piger             3             3
10      Scouts     2nd     Riani             2            62
11      Scouts     2nd       Ali             3            70

regiment
Dragoons      54.00
Nighthawks    15.25
Scouts        19.50
Name: Rating_Score, dtype: float64

regiment
Dragoons      216
Nighthawks     61
Scouts         78
Name: Rating_Score, dtype: int64

regiment
Dragoons      94
Nighthawks    31
Scouts        70
Name: Rating_Score, dtype: int64

regiment
Dragoons      3
Nighthawks    2
Scouts        2
Name: Rating_Score, dtype: int64

regiment
Dragoons      4
Nighthawks    4
Scouts        4
Name: Rating_Score, dtype: int64

Download Materials

What Users are saying..

profile image

Anand Kumpatla

Sr Data Scientist @ Doubleslash Software Solutions Pvt Ltd
linkedin profile url

ProjectPro is a unique platform and helps many people in the industry to solve real-life problems with a step-by-step walkthrough of projects. A platform with some fantastic resources to gain... Read More

Relevant Projects

Predictive Analytics Project for Working Capital Optimization
In this Predictive Analytics Project, you will build a model to accurately forecast the timing of customer and supplier payments for optimizing working capital.

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

MLOps using Azure Devops to Deploy a Classification Model
In this MLOps Azure project, you will learn how to deploy a classification machine learning model to predict the customer's license status on Azure through scalable CI/CD ML pipelines.

Deploy Transformer BART Model for Text summarization on GCP
Learn to Deploy a Machine Learning Model for the Abstractive Text Summarization on Google Cloud Platform (GCP)

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Build CI/CD Pipeline for Machine Learning Projects using Jenkins
In this project, you will learn how to create a CI/CD pipeline for a search engine application using Jenkins.

Llama2 Project for MetaData Generation using FAISS and RAGs
In this LLM Llama2 Project, you will automate metadata generation using Llama2, RAGs, and AWS to reduce manual efforts.

Build an End-to-End AWS SageMaker Classification Model
MLOps on AWS SageMaker -Learn to Build an End-to-End Classification Model on SageMaker to predict a patient’s cause of death.

NLP and Deep Learning For Fake News Classification in Python
In this project you will use Python to implement various machine learning methods( RNN, LSTM, GRU) for fake news classification.

MLOps Project for a Mask R-CNN on GCP using uWSGI Flask
MLOps on GCP - Solved end-to-end MLOps Project to deploy a Mask RCNN Model for Image Segmentation as a Web Application using uWSGI Flask, Docker, and TensorFlow.