How to find quantile and quartiles in R?

This recipe helps you find quantile and quartiles in R

Recipe Objective

Exploratory Data Analysis is a crucial step before building any machine learning model on a dataset. This also includes gathering statistical inferences from the data. There are a few main terms in stats which describes the variability of the numeric variable. These include IQR, quartiles, quantiles, mean and median. They help us to detect any outliers in the column and the distribution of the column.

This recipe focuses on finding quartile and quantile of the column.

Quantile and Quartile gives the measure of variabilty in the data. Quantiles provides a way to divide the numbers of a given distribution in equal subgroups after sorting the data. Quartiles are the three points in the dataset which divides the number of observations into four equal subgroups.

Explore the BERT Variants - ALBERT vs DistilBERT

Step 1: Loading necesary libraries and loading dataset

# Data manipulation package library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in finding the Quantiles and Quartiles of is Annual.Income.

Step 2: Calculating Quantile/Percentile

We use the quantile() function to do the task. Let's get the 30th Quantile value of column Annual Income

# prob argument represent the nth percentile. In this case it's the 30th percentile. quantile(customer_seg$Annual.Income..k.., probs = 0.30)

30%: 46

Step 3: Calculating Quartile

We use the quantile() function to do the same task. This time the probs = 25%, 50%, 75%.

quantile(customer_seg$Annual.Income..k.., prob = c(0.25,0.50, 0.75))

25% 41.5 50% 61.5 75% 78

Step 4: Calculating median and mean together

We use the summary() function to calculate the mean, median and other statistical terms of the column

summary(customer_seg$Annual.Income..k..) Min. 1st Qu. Median Mean 3rd Qu. Max. 15.00 41.50 61.50 60.56 78.00 137.00

What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Many-to-One LSTM for Sentiment Analysis and Text Generation
In this LSTM Project , you will build develop a sentiment detection model using many-to-one LSTMs for accurate prediction of sentiment labels in airline text reviews. Additionally, we will also train many-to-one LSTMs on 'Alice's Adventures in Wonderland' to generate contextually relevant text.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Hands-On Approach to Master PyTorch Tensors with Examples
In this deep learning project, you will learn how to perform various operations on the building block of PyTorch : Tensors.

Expedia Hotel Recommendations Data Science Project
In this data science project, you will contextualize customer data and predict the likelihood a customer will stay at 100 different hotel groups.

Azure Text Analytics for Medical Search Engine Deployment
Microsoft Azure Project - Use Azure text analytics cognitive service to deploy a machine learning model into Azure Databricks

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.

Credit Card Fraud Detection as a Classification Problem
In this data science project, we will predict the credit card fraud in the transactional dataset using some of the predictive models.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.