How to group rows in a Pandas DataFrame?

This recipe helps you group rows in a Pandas DataFrame

Recipe Objective

Before making a model we need to preprocess the data and for that we may need to make group of rows of data.

This data science python source code does the following:
1. Creates your own data dictionary.
2. Conversion of dictionary into dataframe.
3. Groups dataframe based on desired rows.

So this is the recipe on how we can group rows in a Pandas DataFrame.

Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'regiment', 'company', 'name', 'Rating_Score' and 'Comedy_Score'. raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'Rating_Score': [4, 24, 31, 2, 3, 94, 57, 62, 70, 3, 2, 3], 'Comedy_Score': [25, 94, 57, 62, 70, 25, 24, 31, 2, 3, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'Rating_Score', 'Comedy_Score']) print(df)

Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro

Step 3 - Grouping Rows

So we have created an object which will group rows on the basis of 'regiment' and compute statical scores on the basis of 'Rating_Score' regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

    • Mean of regiment_Rating_Score

print(regiment_Rating_Score.mean())

    • Sum of regiment_Rating_Score

print(regiment_Rating_Score.sum())

    • Maximum value of regiment_Rating_Score

print(regiment_Rating_Score.max())

    • Minimum value of regiment_Rating_Score

print(regiment_Rating_Score.min())

    • regiment_Rating_Score count

print(regiment_Rating_Score.count())

So the output comes as:

      regiment company      name  Rating_Score  Comedy_Score
0   Nighthawks     1st    Miller             4            25
1   Nighthawks     1st  Jacobson            24            94
2   Nighthawks     2nd       Ali            31            57
3   Nighthawks     2nd    Milner             2            62
4     Dragoons     1st     Cooze             3            70
5     Dragoons     1st     Jacon            94            25
6     Dragoons     2nd    Ryaner            57            24
7     Dragoons     2nd      Sone            62            31
8       Scouts     1st     Sloan            70             2
9       Scouts     1st     Piger             3             3
10      Scouts     2nd     Riani             2            62
11      Scouts     2nd       Ali             3            70

regiment
Dragoons      54.00
Nighthawks    15.25
Scouts        19.50
Name: Rating_Score, dtype: float64

regiment
Dragoons      216
Nighthawks     61
Scouts         78
Name: Rating_Score, dtype: int64

regiment
Dragoons      94
Nighthawks    31
Scouts        70
Name: Rating_Score, dtype: int64

regiment
Dragoons      3
Nighthawks    2
Scouts        2
Name: Rating_Score, dtype: int64

regiment
Dragoons      4
Nighthawks    4
Scouts        4
Name: Rating_Score, dtype: int64

Download Materials

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Build CNN for Image Colorization using Deep Transfer Learning
Image Processing Project -Train a model for colorization to make grayscale images colorful using convolutional autoencoders.

Mastering A/B Testing: A Practical Guide for Production
In this A/B Testing for Machine Learning Project, you will gain hands-on experience in conducting A/B tests, analyzing statistical significance, and understanding the challenges of building a solution for A/B testing in a production environment.

Learn Object Tracking (SOT, MOT) using OpenCV and Python
Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

Build Regression Models in Python for House Price Prediction
In this Machine Learning Regression project, you will build and evaluate various regression models in Python for house price prediction.

Deep Learning Project for Time Series Forecasting in Python
Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

Learn Hyperparameter Tuning for Neural Networks with PyTorch
In this Deep Learning Project, you will learn how to optimally tune the hyperparameters (learning rate, epochs, dropout, early stopping) of a neural network model in PyTorch to improve model performance.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Build CNN Image Classification Models for Real Time Prediction
Image Classification Project to build a CNN model in Python that can classify images into social security cards, driving licenses, and other key identity information.