How to find quantile and quartiles in R?

This recipe helps you find quantile and quartiles in R

Recipe Objective

Exploratory Data Analysis is a crucial step before building any machine learning model on a dataset. This also includes gathering statistical inferences from the data. There are a few main terms in stats which describes the variability of the numeric variable. These include IQR, quartiles, quantiles, mean and median. They help us to detect any outliers in the column and the distribution of the column.

This recipe focuses on finding quartile and quantile of the column.

Quantile and Quartile gives the measure of variabilty in the data. Quantiles provides a way to divide the numbers of a given distribution in equal subgroups after sorting the data. Quartiles are the three points in the dataset which divides the number of observations into four equal subgroups.

Explore the BERT Variants - ALBERT vs DistilBERT

Step 1: Loading necesary libraries and loading dataset

# Data manipulation package library(tidyverse) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...
Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in finding the Quantiles and Quartiles of is Annual.Income.

Step 2: Calculating Quantile/Percentile

We use the quantile() function to do the task. Let's get the 30th Quantile value of column Annual Income

# prob argument represent the nth percentile. In this case it's the 30th percentile. quantile(customer_seg$Annual.Income..k.., probs = 0.30)

30%: 46

Step 3: Calculating Quartile

We use the quantile() function to do the same task. This time the probs = 25%, 50%, 75%.

quantile(customer_seg$Annual.Income..k.., prob = c(0.25,0.50, 0.75))

25% 41.5 50% 61.5 75% 78

Step 4: Calculating median and mean together

We use the summary() function to calculate the mean, median and other statistical terms of the column

summary(customer_seg$Annual.Income..k..) Min. 1st Qu. Median Mean 3rd Qu. Max. 15.00 41.50 61.50 60.56 78.00 137.00

What Users are saying..

profile image

Savvy Sahai

Data Science Intern, Capgemini
linkedin profile url

As a student looking to break into the field of data engineering and data science, one can get really confused as to which path to take. Very few ways to do it are Google, YouTube, etc. I was one of... Read More

Relevant Projects

Build Deep Autoencoders Model for Anomaly Detection in Python
In this deep learning project , you will build and deploy a deep autoencoders model using Flask.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

AWS MLOps Project to Deploy Multiple Linear Regression Model
Build and Deploy a Multiple Linear Regression Model in Python on AWS

Build a Text Classification Model with Attention Mechanism NLP
In this NLP Project, you will learn to build a multi class text classification model with attention mechanism.

Build a CNN Model with PyTorch for Image Classification
In this deep learning project, you will learn how to build an Image Classification Model using PyTorch CNN

A/B Testing Approach for Comparing Performance of ML Models
The objective of this project is to compare the performance of BERT and DistilBERT models for building an efficient Question and Answering system. Using A/B testing approach, we explore the effectiveness and efficiency of both models and determine which one is better suited for Q&A tasks.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Classification Projects on Machine Learning for Beginners - 2
Learn to implement various ensemble techniques to predict license status for a given business.