How to plot scatter plot using ggplot in R?

This recipe helps you plot scatter plot using ggplot in R

Recipe Objective

Scatter plot is the simplest chart which uses cartesian coordinates to display the relation between two variables x and y. It is used to find any trend between the two variable. ​

In this recipe we are going to use ggplot2 package to plot the required scatter plot. ggplot2 package is based on the book "Grammar of Graphics" by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_histogram() - for plotting histogram

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

This recipe demonstrates how to make a scatter plot using ggplot2.

German Credit Card Dataset Analysis

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s and Spending Score

# Data manipulation package library(tidyverse) ​ # ggplot for data visualisation library(ggplot2) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)

Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

STEP 2: Plotting a scatter plot using ggplot

We use geometric object as geom_point() to plot a scatter plot between two variables

Note:

  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. aes() arguement inside the geom_point() object is used to group the points w.r.t the factor variable Gender by using color.
  3. We also use labs() function to give a title to the graph

ggplot(customer_seg, aes(x = Annual.Income..k.., y = Age)) + geom_point(aes(color = Gender)) + labs(title = "Annual Income vs Age")

What Users are saying..

profile image

Ray han

Tech Leader | Stanford / Yale University
linkedin profile url

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.

AWS MLOps Project for ARCH and GARCH Time Series Models
Build and deploy ARCH and GARCH time series forecasting models in Python on AWS .

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

Create Your First Chatbot with RASA NLU Model and Python
Learn the basic aspects of chatbot development and open source conversational AI RASA to create a simple AI powered chatbot on your own.

Build a Hybrid Recommender System in Python using LightFM
In this Recommender System project, you will build a hybrid recommender system in Python using LightFM .

Loan Default Prediction Project using Explainable AI ML Models
Loan Default Prediction Project that employs sophisticated machine learning models, such as XGBoost and Random Forest and delves deep into the realm of Explainable AI, ensuring every prediction is transparent and understandable.

Azure Deep Learning-Deploy RNN CNN models for TimeSeries
In this Azure MLOps Project, you will learn to perform docker-based deployment of RNN and CNN Models for Time Series Forecasting on Azure Cloud.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Time Series Classification Project for Elevator Failure Prediction
In this Time Series Project, you will predict the failure of elevators using IoT sensor data as a time series classification machine learning problem.