How to do Data Analysis in a Pandas DataFrame?

How to do Data Analysis in a Pandas DataFrame?

How to do Data Analysis in a Pandas DataFrame?

This recipe helps you do Data Analysis in a Pandas DataFrame


Recipe Objective

Data Analysis means analyzing the data and preprocess it for further use. For this we have to use many statical analysis.

So this is the recipe on how we can Data Analysis in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with different features like "regiment", "company", "name", "preTestScore", "postTestScore". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze", "Jacon", "Ryaner", "Sone", "Sloan", "Piger", "Riani", "Ali"], "preTestScore": [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], "postTestScore": [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "name", "preTestScore", "postTestScore"]) print(df)

Step 3 - Applying Data Analysis

Here we will be using different methods to do find different statistics for Data Analysis.

  • We are making a descriptive table for features. We have made an object to group pretestscore on the basis of regiment and we will be using this object further.
  • groupby_regiment = df["preTestScore"].groupby(df["regiment"]) print(df["preTestScore"].groupby(df["regiment"]).describe())
  • Here we have calculated mean
  • print(groupby_regiment.mean())
  • Here we have calculated mean of preTestScore grouped on the basis of two other features.
  • print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean()) print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean().unstack())
  • Here we have calculated mean and size of grouped features
  • print(df.groupby(["regiment", "company"]).mean()) print(df.groupby(["regiment", "company"]).size())
  • Now we are creating bins of postTestScore
  • bins = [0, 25, 50, 75, 100] group_names = ["Low", "Okay", "Good", "Great"] df["categories"] = pd.cut(df["postTestScore"], bins, labels=group_names) print() print(df["categories"])
So the output comes as:

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

            count   mean        std  min   25%   50%    75%   max
Dragoons      4.0  15.50  14.153916  3.0  3.75  14.0  25.75  31.0
Nighthawks    4.0  15.25  14.453950  2.0  3.50  14.0  25.75  31.0
Scouts        4.0   2.50   0.577350  2.0  2.00   2.5   3.00   3.0

Dragoons      15.50
Nighthawks    15.25
Scouts         2.50
Name: preTestScore, dtype: float64

regiment    company
Dragoons    1st         3.5
            2nd        27.5
Nighthawks  1st        14.0
            2nd        16.5
Scouts      1st         2.5
            2nd         2.5
Name: preTestScore, dtype: float64

company      1st   2nd
Dragoons     3.5  27.5
Nighthawks  14.0  16.5
Scouts       2.5   2.5

                    preTestScore  postTestScore
regiment   company                             
Dragoons   1st               3.5           47.5
           2nd              27.5           75.5
Nighthawks 1st              14.0           59.5
           2nd              16.5           59.5
Scouts     1st               2.5           66.0
           2nd               2.5           66.0

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

0       Low
1     Great
2      Good
3      Good
4      Good
5       Low
6     Great
7      Good
8      Good
9      Good
10     Good
11     Good
Name: categories, dtype: category
Categories (4, object): [Low < Okay < Good < Great]

Relevant Projects

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Solving Multiple Classification use cases Using H2O
In this project, we are going to talk about H2O and functionality in terms of building Machine Learning models.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Predict Employee Computer Access Needs in Python
Data Science Project in Python- Given his or her job role, predict employee access needs using amazon employee database.

Sequence Classification with LSTM RNN in Python with Keras
In this project, we are going to work on Sequence to Sequence Prediction using IMDB Movie Review Dataset​ using Keras in Python.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Walmart Sales Forecasting Data Science Project
Data Science Project in R-Predict the sales for each department using historical markdown data from the Walmart dataset containing data of 45 Walmart stores.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.