How to get descriptive statistics of a Pandas DataFrame?

This recipe helps you get descriptive statistics of a Pandas DataFrame

Recipe Objective

Before making a model we need to analyse the data and for that we need to calculate different statics of the features.

This is the data science python source code does the following
1. Creates data dictionary and converts it into pandas dataframe
2. Uses describe function on dataframe
3. Performs statistical analysis on the dataset

So this is the recipe on how we can get descriptive statistics of a Pandas DataFrame

Master the Art of Data Cleaning in Machine Learning

Step 1 - Import the library

import pandas as pd

We have imported pandas which will be need for the dataset.

Step 2 - Setting up the Data

We have created a dictionary of data and passed it in pd.DataFrame to make a dataframe with columns 'first_name', 'last_name', 'age', 'Comedy_Score' and 'Rating_Score'. raw_data = {'first_name': ['Sheldon', 'Raj', 'Leonard', 'Howard', 'Amy'], 'last_name': ['Copper', 'Koothrappali', 'Hofstadter', 'Wolowitz', 'Fowler'], 'age': [42, 38, 36, 41, 35], 'Comedy_Score': [9, 7, 8, 8, 5], 'Rating_Score': [25, 25, 49, 62, 70]} df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'Comedy_Score', 'Rating_Score']) print(df) print(df.info())

Step 3 - Finding different statistics

So we will be finding different statistic of the feature.

    • First, sum of all the ages

print(df['age'].sum())

    • Mean of Rating_Score

print(df['Rating_Score'].mean())

    • Cumulative sum of Rating_Score

print(df['Rating_Score'].cumsum())

    • Summary statistics on Rating_Score

print(df['Rating_Score'].describe())

    • Counting the number of non-NA values

print(df['Rating_Score'].count())

    • Minimum value of Rating_Score

print(df['Rating_Score'].min())

    • Maximum value of Rating_Score

print(df['Rating_Score'].max())

    • Median value of Rating_Score

print(df['Rating_Score'].median())

    • Sample variance of Rating_Score values

print(df['Rating_Score'].var())

    • Sample standard deviation of Rating_Score values

print(df['Rating_Score'].std())

    • Skewness of Rating_Score values

print(df['Rating_Score'].skew())

    • Kurtosis of Rating_Score values

print(df['Rating_Score'].kurt())

    • Correlation Matrix Of Values

print(df.corr())

    • Finally, Covariance Matrix Of Values

print(df.cov())

So the output comes as:

 first_name     last_name  age  Comedy_Score  Rating_Score
0    Sheldon        Copper   42             9            25
1        Raj  Koothrappali   38             7            25
2    Leonard    Hofstadter   36             8            49
3     Howard      Wolowitz   41             8            62
4        Amy        Fowler   35             5            70

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
first_name      5 non-null object
last_name       5 non-null object
age             5 non-null int64
Comedy_Score    5 non-null int64
Rating_Score    5 non-null int64
dtypes: int64(3), object(2)
memory usage: 280.0+ bytes
None

192

46.2

0     25
1     50
2     99
3    161
4    231
Name: Rating_Score, dtype: int64

count     5.000000
mean     46.200000
std      20.753313
min      25.000000
25%      25.000000
50%      49.000000
75%      62.000000
max      70.000000
Name: Rating_Score, dtype: float64

5

25

70

49.0

430.7

20.7533129885327

-0.07499061439128718

-2.6952969741807777

                   age  Comedy_Score  Rating_Score
age           1.000000      0.767579     -0.451895
Comedy_Score  0.767579      1.000000     -0.567136
Rating_Score -0.451895     -0.567136      1.000000

                age  Comedy_Score  Rating_Score
age            9.30          3.55        -28.60
Comedy_Score   3.55          2.30        -17.85
Rating_Score -28.60        -17.85        430.70

Download Materials

What Users are saying..

profile image

Ameeruddin Mohammed

ETL (Abintio) developer at IBM
linkedin profile url

I come from a background in Marketing and Analytics and when I developed an interest in Machine Learning algorithms, I did multiple in-class courses from reputed institutions though I got good... Read More

Relevant Projects

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

Recommender System Machine Learning Project for Beginners-1
Recommender System Machine Learning Project for Beginners - Learn how to design, implement and train a rule-based recommender system in Python

Deep Learning Project for Time Series Forecasting in Python
Deep Learning for Time Series Forecasting in Python -A Hands-On Approach to Build Deep Learning Models (MLP, CNN, LSTM, and a Hybrid Model CNN-LSTM) on Time Series Data.

GCP MLOps Project to Deploy ARIMA Model using uWSGI Flask
Build an end-to-end MLOps Pipeline to deploy a Time Series ARIMA Model on GCP using uWSGI and Flask

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

Build ARCH and GARCH Models in Time Series using Python
In this Project we will build an ARCH and a GARCH model using Python

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Deep Learning Project for Text Detection in Images using Python
CV2 Text Detection Code for Images using Python -Build a CRNN deep learning model to predict the single-line text in a given image.