How to find quantile and quartiles in R?

How to find quantile and quartiles in R?

How to find quantile and quartiles in R?

This recipe helps you find quantile and quartiles in R


Recipe Objective

Exploratory Data Analysis is a crucial step before building any machine learning model on a dataset. This also includes gathering statistical inferences from the data. There are a few main terms in stats which describes the variability of the numeric variable. These include IQR, quartiles, quantiles, mean and median. They help us to detect any outliers in the column and the distribution of the column.

This recipe focuses on finding quartile and quantile of the column.

Quantile and Quartile gives the measure of variabilty in the data. Quantiles provides a way to divide the numbers of a given distribution in equal subgroups after sorting the data. Quartiles are the three points in the dataset which divides the number of observations into four equal subgroups.

Step 1: Loading necesary libraries and loading dataset

# Data manipulation package library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in finding the Quantiles and Quartiles of is Annual.Income.

Step 2: Calculating Quantile/Percentile

We use the quantile() function to do the task. Let's get the 30th Quantile value of column Annual Income

# prob argument represent the nth percentile. In this case it's the 30th percentile. quantile(customer_seg$Annual.Income..k.., probs = 0.30)
30%: 46

Step 3: Calculating Quartile

We use the quantile() function to do the same task. This time the probs = 25%, 50%, 75%.

quantile(customer_seg$Annual.Income..k.., prob = c(0.25,0.50, 0.75))
25% 41.5 50% 61.5 75% 78

Step 4: Calculating median and mean together

We use the summary() function to calculate the mean, median and other statistical terms of the column

summary(customer_seg$Annual.Income..k..) Min. 1st Qu. Median Mean 3rd Qu. Max. 15.00 41.50 61.50 60.56 78.00 137.00

Relevant Projects

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Resume parsing with Machine learning - NLP with Python OCR and Spacy
In this machine learning resume parser example we use the popular Spacy NLP python library for OCR and text classification.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Music Recommendation System Project using Python and R
Machine Learning Project - Work with KKBOX's Music Recommendation System dataset to build the best music recommendation engine.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.