How to do Data Analysis in a Pandas DataFrame?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to do Data Analysis in a Pandas DataFrame?

How to do Data Analysis in a Pandas DataFrame?

This recipe helps you do Data Analysis in a Pandas DataFrame

Recipe Objective

Data Analysis means analyzing the data and preprocess it for further use. For this we have to use many statical analysis.

So this is the recipe on how we can Data Analysis in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with different features like "regiment", "company", "name", "preTestScore", "postTestScore". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze", "Jacon", "Ryaner", "Sone", "Sloan", "Piger", "Riani", "Ali"], "preTestScore": [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], "postTestScore": [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "name", "preTestScore", "postTestScore"]) print(df)

Step 3 - Applying Data Analysis

Here we will be using different methods to do find different statistics for Data Analysis.

  • We are making a descriptive table for features. We have made an object to group pretestscore on the basis of regiment and we will be using this object further.
  • groupby_regiment = df["preTestScore"].groupby(df["regiment"]) print(df["preTestScore"].groupby(df["regiment"]).describe())
  • Here we have calculated mean
  • print(groupby_regiment.mean())
  • Here we have calculated mean of preTestScore grouped on the basis of two other features.
  • print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean()) print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean().unstack())
  • Here we have calculated mean and size of grouped features
  • print(df.groupby(["regiment", "company"]).mean()) print(df.groupby(["regiment", "company"]).size())
  • Now we are creating bins of postTestScore
  • bins = [0, 25, 50, 75, 100] group_names = ["Low", "Okay", "Good", "Great"] df["categories"] = pd.cut(df["postTestScore"], bins, labels=group_names) print() print(df["categories"])
So the output comes as:

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

            count   mean        std  min   25%   50%    75%   max
regiment                                                         
Dragoons      4.0  15.50  14.153916  3.0  3.75  14.0  25.75  31.0
Nighthawks    4.0  15.25  14.453950  2.0  3.50  14.0  25.75  31.0
Scouts        4.0   2.50   0.577350  2.0  2.00   2.5   3.00   3.0

regiment
Dragoons      15.50
Nighthawks    15.25
Scouts         2.50
Name: preTestScore, dtype: float64

regiment    company
Dragoons    1st         3.5
            2nd        27.5
Nighthawks  1st        14.0
            2nd        16.5
Scouts      1st         2.5
            2nd         2.5
Name: preTestScore, dtype: float64

company      1st   2nd
regiment              
Dragoons     3.5  27.5
Nighthawks  14.0  16.5
Scouts       2.5   2.5

                    preTestScore  postTestScore
regiment   company                             
Dragoons   1st               3.5           47.5
           2nd              27.5           75.5
Nighthawks 1st              14.0           59.5
           2nd              16.5           59.5
Scouts     1st               2.5           66.0
           2nd               2.5           66.0

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

0       Low
1     Great
2      Good
3      Good
4      Good
5       Low
6     Great
7      Good
8      Good
9      Good
10     Good
11     Good
Name: categories, dtype: category
Categories (4, object): [Low < Okay < Good < Great]


Download Materials

Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Data Science Project-TalkingData AdTracking Fraud Detection
Machine Learning Project in R-Detect fraudulent click traffic for mobile app ads using R data science programming language.

Ola Bike Rides Request Demand Forecast
Given big data at taxi service (ride-hailing) i.e. OLA, you will learn multi-step time series forecasting and clustering with Mini-Batch K-means Algorithm on geospatial data to predict future ride requests for a particular region at a given time.

Medical Image Segmentation Deep Learning Project
In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

Churn Prediction in Telecom using Machine Learning in R
Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Abstractive Text Summarization using Transformers-BART Model
Deep Learning Project to implement an Abstractive Text Summarizer using Google's Transformers-BART Model to generate news article headlines.