How to apply functions in a Group in a Pandas DataFrame?

How to apply functions in a Group in a Pandas DataFrame?

How to apply functions in a Group in a Pandas DataFrame?

This recipe helps you apply functions in a Group in a Pandas DataFrame


Recipe Objective

Have you tried to apply a function on any dataset. One of the easiest way is to use apply function.

So this is the recipe on how we can apply functions in a Group in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have made a dataframe by using a dictionary. We have passed a dictionary with different values to create a dataframe. data = {"EmployeeGroup": ["A","A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"], "Points": [10,40,50,70,50,50,60,10,40,50,60,70,40,60,40,60]} df = pd.DataFrame(data) print(" The Original DataFrame") print(df)

Step 3 - Training and Saving the model

We have used apply function to find Rolling Mean, Average, Sum, Maximum and Minimum. For this we have used lambda function on each and every values of the feature. print(" Rolling Mean:"); print(df.groupby("EmployeeGroup")["Points"].apply(lambda x:x.rolling(center=False,window=2).mean())) print(" Average:"); print(df.groupby("EmployeeGroup")["Points"].apply(lambda x:x.mean())) print(" Sum:"); print(df.groupby("EmployeeGroup")["Points"].apply(lambda x:x.sum())) print(" Maximum:"); print(df.groupby("EmployeeGroup")["Points"].apply(lambda x:x.max())) print(" Minimum:"); print(df.groupby("EmployeeGroup")["Points"].apply(lambda x:x.min())) So the output comes as:

The Original DataFrame
   EmployeeGroup  Points
0              A      10
1              A      40
2              A      50
3              A      70
4              A      50
5              A      50
6              B      60
7              B      10
8              B      40
9              B      50
10             B      60
11             C      70
12             C      40
13             C      60
14             C      40
15             C      60

Rolling Mean:
0      NaN
1     25.0
2     45.0
3     60.0
4     60.0
5     50.0
6      NaN
7     35.0
8     25.0
9     45.0
10    55.0
11     NaN
12    55.0
13    50.0
14    50.0
15    50.0
Name: Points, dtype: float64

A    45.0
B    44.0
C    54.0
Name: Points, dtype: float64

A    270
B    220
C    270
Name: Points, dtype: int64

A    70
B    60
C    70
Name: Points, dtype: int64

A    10
B    10
C    40
Name: Points, dtype: int64

Relevant Projects

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Time Series Forecasting with LSTM Neural Network Python
Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Data Science Project on Wine Quality Prediction in R
In this R data science project, we will explore wine dataset to assess red wine quality. The objective of this data science project is to explore which chemical properties will influence the quality of red wines.