How to group rows in a Pandas DataFrame?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to group rows in a Pandas DataFrame?

How to group rows in a Pandas DataFrame?

This recipe helps you group rows in a Pandas DataFrame

Recipe Objective

Before making a model we need to preprocess the data and for that we may need to make group of rows of data.

This data science python source code does the following:
1. Creates your own data dictionary.
2. Conversion of dictionary into dataframe.
3. Groups dataframe based on desired rows.

So this is the recipe on how we can group rows in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'regiment', 'company', 'name', 'Rating_Score' and 'Comedy_Score'. raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'Rating_Score': [4, 24, 31, 2, 3, 94, 57, 62, 70, 3, 2, 3], 'Comedy_Score': [25, 94, 57, 62, 70, 25, 24, 31, 2, 3, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'Rating_Score', 'Comedy_Score']) print(df)

Step 3 - Grouping Rows

So we have created an object which will group rows on the basis of 'regiment' and compute statical scores on the basis of 'Rating_Score' regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

  • Mean of regiment_Rating_Score
  • print(regiment_Rating_Score.mean())
  • Sum of regiment_Rating_Score
  • print(regiment_Rating_Score.sum())
  • Maximum value of regiment_Rating_Score
  • print(regiment_Rating_Score.max())
  • Minimum value of regiment_Rating_Score
  • print(regiment_Rating_Score.min())
  • regiment_Rating_Score count
  • print(regiment_Rating_Score.count())
So the output comes as:

      regiment company      name  Rating_Score  Comedy_Score
0   Nighthawks     1st    Miller             4            25
1   Nighthawks     1st  Jacobson            24            94
2   Nighthawks     2nd       Ali            31            57
3   Nighthawks     2nd    Milner             2            62
4     Dragoons     1st     Cooze             3            70
5     Dragoons     1st     Jacon            94            25
6     Dragoons     2nd    Ryaner            57            24
7     Dragoons     2nd      Sone            62            31
8       Scouts     1st     Sloan            70             2
9       Scouts     1st     Piger             3             3
10      Scouts     2nd     Riani             2            62
11      Scouts     2nd       Ali             3            70

regiment
Dragoons      54.00
Nighthawks    15.25
Scouts        19.50
Name: Rating_Score, dtype: float64

regiment
Dragoons      216
Nighthawks     61
Scouts         78
Name: Rating_Score, dtype: int64

regiment
Dragoons      94
Nighthawks    31
Scouts        70
Name: Rating_Score, dtype: int64

regiment
Dragoons      3
Nighthawks    2
Scouts        2
Name: Rating_Score, dtype: int64

regiment
Dragoons      4
Nighthawks    4
Scouts        4
Name: Rating_Score, dtype: int64

Download Materials

Relevant Projects

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Machine learning for Retail Price Recommendation with Python
Use the Mercari Dataset with dynamic pricing to build a price recommendation algorithm using machine learning in Python to automatically suggest the right product prices.

Census Income Data Set Project - Predict Adult Census Income
Use the Adult Income dataset to predict whether income exceeds 50K yr based on census data.

Predict Macro Economic Trends using Kaggle Financial Dataset
In this machine learning project, you will uncover the predictive value in an uncertain world by using various artificial intelligence, machine learning, advanced regression and feature transformation techniques.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Build a Music Recommendation Algorithm using KKBox's Dataset
Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Word2Vec and FastText Word Embedding with Gensim in Python
In this NLP Project, you will learn how to use the popular topic modelling library Gensim for implementing two state-of-the-art word embedding methods Word2Vec and FastText models.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.