How to do Data Analysis in a Pandas DataFrame?
DATA MUNGING DATA CLEANING PYTHON MACHINE LEARNING RECIPES PANDAS CHEATSHEET     ALL TAGS

How to do Data Analysis in a Pandas DataFrame?

How to do Data Analysis in a Pandas DataFrame?

This recipe helps you do Data Analysis in a Pandas DataFrame

0

Recipe Objective

Data Analysis means analyzing the data and preprocess it for further use. For this we have to use many statical analysis.

So this is the recipe on how we can Data Analysis in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with different features like "regiment", "company", "name", "preTestScore", "postTestScore". raw_data = {"regiment": ["Nighthawks", "Nighthawks", "Nighthawks", "Nighthawks", "Dragoons", "Dragoons", "Dragoons", "Dragoons", "Scouts", "Scouts", "Scouts", "Scouts"], "company": ["1st", "1st", "2nd", "2nd", "1st", "1st", "2nd", "2nd","1st", "1st", "2nd", "2nd"], "name": ["Miller", "Jacobson", "Ali", "Milner", "Cooze", "Jacon", "Ryaner", "Sone", "Sloan", "Piger", "Riani", "Ali"], "preTestScore": [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], "postTestScore": [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]} df = pd.DataFrame(raw_data, columns = ["regiment", "company", "name", "preTestScore", "postTestScore"]) print(df)

Step 3 - Applying Data Analysis

Here we will be using different methods to do find different statistics for Data Analysis.

  • We are making a descriptive table for features. We have made an object to group pretestscore on the basis of regiment and we will be using this object further.
  • groupby_regiment = df["preTestScore"].groupby(df["regiment"]) print(df["preTestScore"].groupby(df["regiment"]).describe())
  • Here we have calculated mean
  • print(groupby_regiment.mean())
  • Here we have calculated mean of preTestScore grouped on the basis of two other features.
  • print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean()) print(df["preTestScore"].groupby([df["regiment"], df["company"]]).mean().unstack())
  • Here we have calculated mean and size of grouped features
  • print(df.groupby(["regiment", "company"]).mean()) print(df.groupby(["regiment", "company"]).size())
  • Now we are creating bins of postTestScore
  • bins = [0, 25, 50, 75, 100] group_names = ["Low", "Okay", "Good", "Great"] df["categories"] = pd.cut(df["postTestScore"], bins, labels=group_names) print() print(df["categories"])
So the output comes as:

      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

            count   mean        std  min   25%   50%    75%   max
regiment                                                         
Dragoons      4.0  15.50  14.153916  3.0  3.75  14.0  25.75  31.0
Nighthawks    4.0  15.25  14.453950  2.0  3.50  14.0  25.75  31.0
Scouts        4.0   2.50   0.577350  2.0  2.00   2.5   3.00   3.0

regiment
Dragoons      15.50
Nighthawks    15.25
Scouts         2.50
Name: preTestScore, dtype: float64

regiment    company
Dragoons    1st         3.5
            2nd        27.5
Nighthawks  1st        14.0
            2nd        16.5
Scouts      1st         2.5
            2nd         2.5
Name: preTestScore, dtype: float64

company      1st   2nd
regiment              
Dragoons     3.5  27.5
Nighthawks  14.0  16.5
Scouts       2.5   2.5

                    preTestScore  postTestScore
regiment   company                             
Dragoons   1st               3.5           47.5
           2nd              27.5           75.5
Nighthawks 1st              14.0           59.5
           2nd              16.5           59.5
Scouts     1st               2.5           66.0
           2nd               2.5           66.0

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

0       Low
1     Great
2      Good
3      Good
4      Good
5       Low
6     Great
7      Good
8      Good
9      Good
10     Good
11     Good
Name: categories, dtype: category
Categories (4, object): [Low < Okay < Good < Great]


Relevant Projects

Learn to prepare data for your next machine learning project
Text data requires special preparation before you can start using it for any machine learning project.In this ML project, you will learn about applying Machine Learning models to create classifiers and learn how to make sense of textual data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Deep Learning with Keras in R to Predict Customer Churn
In this deep learning project, we will predict customer churn using Artificial Neural Networks and learn how to model an ANN in R with the keras deep learning package.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

Predict Census Income using Deep Learning Models
In this project, we are going to work on Deep Learning using H2O to predict Census income.