How to group rows in a Pandas DataFrame?

How to group rows in a Pandas DataFrame?

How to group rows in a Pandas DataFrame?

This recipe helps you group rows in a Pandas DataFrame


Recipe Objective

Before making a model we need to preprocess the data and for that we may need to make group of rows of data.

This data science python source code does the following:
1. Creates your own data dictionary.
2. Conversion of dictionary into dataframe.
3. Groups dataframe based on desired rows.

So this is the recipe on how we can group rows in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'regiment', 'company', 'name', 'Rating_Score' and 'Comedy_Score'. raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'Rating_Score': [4, 24, 31, 2, 3, 94, 57, 62, 70, 3, 2, 3], 'Comedy_Score': [25, 94, 57, 62, 70, 25, 24, 31, 2, 3, 62, 70]} df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'Rating_Score', 'Comedy_Score']) print(df)

Step 3 - Grouping Rows

So we have created an object which will group rows on the basis of 'regiment' and compute statical scores on the basis of 'Rating_Score' regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

  • Mean of regiment_Rating_Score
  • print(regiment_Rating_Score.mean())
  • Sum of regiment_Rating_Score
  • print(regiment_Rating_Score.sum())
  • Maximum value of regiment_Rating_Score
  • print(regiment_Rating_Score.max())
  • Minimum value of regiment_Rating_Score
  • print(regiment_Rating_Score.min())
  • regiment_Rating_Score count
  • print(regiment_Rating_Score.count())
So the output comes as:

      regiment company      name  Rating_Score  Comedy_Score
0   Nighthawks     1st    Miller             4            25
1   Nighthawks     1st  Jacobson            24            94
2   Nighthawks     2nd       Ali            31            57
3   Nighthawks     2nd    Milner             2            62
4     Dragoons     1st     Cooze             3            70
5     Dragoons     1st     Jacon            94            25
6     Dragoons     2nd    Ryaner            57            24
7     Dragoons     2nd      Sone            62            31
8       Scouts     1st     Sloan            70             2
9       Scouts     1st     Piger             3             3
10      Scouts     2nd     Riani             2            62
11      Scouts     2nd       Ali             3            70

Dragoons      54.00
Nighthawks    15.25
Scouts        19.50
Name: Rating_Score, dtype: float64

Dragoons      216
Nighthawks     61
Scouts         78
Name: Rating_Score, dtype: int64

Dragoons      94
Nighthawks    31
Scouts        70
Name: Rating_Score, dtype: int64

Dragoons      3
Nighthawks    2
Scouts        2
Name: Rating_Score, dtype: int64

Dragoons      4
Nighthawks    4
Scouts        4
Name: Rating_Score, dtype: int64

Relevant Projects

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Data Science Project in Python on BigMart Sales Prediction
The goal of this data science project is to build a predictive model and find out the sales of each product at a given Big Mart store.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.