How to do Data Analysis in a Pandas DataFrame?

This recipe helps you do Data Analysis in a Pandas DataFrame

Recipe Objective

Data Analysis means analyzing the data and preprocess it for further use. For this we have to use many statical analysis.

So this is the recipe on how we can Data Analysis in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with different features like "regiment", "company", "name", "preTestScore", "postTestScore". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze", "Jacon", "Ryaner", "Sone", "Sloan", "Piger", "Riani", "Ali"], "preTestScore": [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], "postTestScore": [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "name", "preTestScore", "postTestScore"]) print(df)

Step 3 - Applying Data Analysis

Here we will be using different methods to do find different statistics for Data Analysis.

  • We are making a descriptive table for features. We have made an object to group pretestscore on the basis of regiment and we will be using this object further.
  • groupby_regiment = df["preTestScore"].groupby(df["regiment"]) print(df["preTestScore"].groupby(df["regiment"]).describe())
  • Here we have calculated mean
  • print(groupby_regiment.mean())
  • Here we have calculated mean of preTestScore grouped on the basis of two other features.
  • print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean()) print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean().unstack())
  • Here we have calculated mean and size of grouped features
  • print(df.groupby(["regiment", "company"]).mean()) print(df.groupby(["regiment", "company"]).size())
  • Now we are creating bins of postTestScore
  • bins = [0, 25, 50, 75, 100] group_names = ["Low", "Okay", "Good", "Great"] df["categories"] = pd.cut(df["postTestScore"], bins, labels=group_names) print() print(df["categories"])
So the output comes as:

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

            count   mean        std  min   25%   50%    75%   max
regiment                                                         
Dragoons      4.0  15.50  14.153916  3.0  3.75  14.0  25.75  31.0
Nighthawks    4.0  15.25  14.453950  2.0  3.50  14.0  25.75  31.0
Scouts        4.0   2.50   0.577350  2.0  2.00   2.5   3.00   3.0

regiment
Dragoons      15.50
Nighthawks    15.25
Scouts         2.50
Name: preTestScore, dtype: float64

regiment    company
Dragoons    1st         3.5
            2nd        27.5
Nighthawks  1st        14.0
            2nd        16.5
Scouts      1st         2.5
            2nd         2.5
Name: preTestScore, dtype: float64

company      1st   2nd
regiment              
Dragoons     3.5  27.5
Nighthawks  14.0  16.5
Scouts       2.5   2.5

                    preTestScore  postTestScore
regiment   company                             
Dragoons   1st               3.5           47.5
           2nd              27.5           75.5
Nighthawks 1st              14.0           59.5
           2nd              16.5           59.5
Scouts     1st               2.5           66.0
           2nd               2.5           66.0

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

0       Low
1     Great
2      Good
3      Good
4      Good
5       Low
6     Great
7      Good
8      Good
9      Good
10     Good
11     Good
Name: categories, dtype: category
Categories (4, object): [Low < Okay < Good < Great]


Download Materials

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

NLP Project on LDA Topic Modelling Python using RACE Dataset
Use the RACE dataset to extract a dominant topic from each document and perform LDA topic modeling in python.

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Build a Multi Touch Attribution Machine Learning Model in Python
Identifying the ROI on marketing campaigns is an essential KPI for any business. In this ML project, you will learn to build a Multi Touch Attribution Model in Python to identify the ROI of various marketing efforts and their impact on conversions or sales..

Learn to Build Generative Models Using PyTorch Autoencoders
In this deep learning project, you will learn how to build a Generative Model using Autoencoders in PyTorch

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

End-to-End Speech Emotion Recognition Project using ANN
Speech Emotion Recognition using RAVDESS Audio Dataset - Build an Artificial Neural Network Model to Classify Audio Data into various Emotions like Sad, Happy, Angry, and Neutral

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Deep Learning Project- Real-Time Fruit Detection using YOLOv4
In this deep learning project, you will learn to build an accurate, fast, and reliable real-time fruit detection system using the YOLOv4 object detection model for robotic harvesting platforms.

Multi-Class Text Classification with Deep Learning using BERT
In this deep learning project, you will implement one of the most popular state of the art Transformer models, BERT for Multi-Class Text Classification

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.