How to plot scatter plot using ggplot in R?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to plot scatter plot using ggplot in R?

How to plot scatter plot using ggplot in R?

This recipe helps you plot scatter plot using ggplot in R

0

Recipe Objective

Scatter plot is the simplest chart which uses cartesian coordinates to display the relation between two variables x and y. It is used to find any trend between the two variable. ​

In this recipe we are going to use ggplot2 package to plot the required scatter plot. ggplot2 package is based on the book "Grammar of Graphics" by Wilkinson. This package provides flexibility while incorporating different themes and plot specification with a high level of abstraction. The package mainly uses aesthetic mapping and geometric objects as arguments. Different types of geometric objects include: ​

  1. geom_point() - for plotting points
  2. geom_bar() - for plotting bar graph
  3. geom_line() - for plotting line chart
  4. geom_histogram() - for plotting histogram

The basic syntax of gggplot2 plots is: ​

ggplot(data, mapping = aes(x =, y=)) + geometric object ​

where: ​

  1. data : Dataframe that is used to plot the chart
  2. mapping = aes() : aesthetic mapping which deals with controlling axis (x and y indicates the different variables)
  3. geometric object : Indicates the code for typeof plot you need to visualise.

This recipe demonstrates how to make a scatter plot using ggplot2.

STEP 1: Loading required library and dataset

Dataset description: It is the basic data about the customers going to the supermarket mall. The variable that we interested in is Annual.Income which is in 1000s and Spending Score

# Data manipulation package library(tidyverse) ​ # ggplot for data visualisation library(ggplot2) ​ # reading a dataset customer_seg = read.csv('Mall_Customers.csv') ​ glimpse(customer_seg)
Rows: 200
Columns: 5
$ CustomerID              1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1...
$ Gender                  Male, Male, Female, Female, Female, Female, ...
$ Age                     19, 21, 20, 23, 31, 22, 35, 23, 64, 30, 67, ...
$ Annual.Income..k..      15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, ...
$ Spending.Score..1.100.  39, 81, 6, 77, 40, 76, 6, 94, 3, 72, 14, 99,...

STEP 2: Plotting a scatter plot using ggplot

We use geometric object as geom_point() to plot a scatter plot between two variables

Note:

  1. The + sign in the syntax earlier makes the code more readable and enables R to read further code without breaking it.
  2. aes() arguement inside the geom_point() object is used to group the points w.r.t the factor variable Gender by using color.
  3. We also use labs() function to give a title to the graph
ggplot(customer_seg, aes(x = Annual.Income..k.., y = Age)) + geom_point(aes(color = Gender)) + labs(title = "Annual Income vs Age")

Relevant Projects

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Zillow’s Home Value Prediction (Zestimate)
Data Science Project in R -Build a machine learning algorithm to predict the future sale prices of homes.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

Machine Learning or Predictive Models in IoT - Energy Prediction Use Case
In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Build a Collaborative Filtering Recommender System in Python
Use the Amazon Reviews/Ratings dataset of 2 Million records to build a recommender system using memory-based collaborative filtering in Python.

Build an Image Classifier for Plant Species Identification
In this machine learning project, we will use binary leaf images and extracted features, including shape, margin, and texture to accurately identify plant species using different benchmark classification techniques.

Machine Learning project for Retail Price Optimization
In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

Ensemble Machine Learning Project - All State Insurance Claims Severity Prediction
In this ensemble machine learning project, we will predict what kind of claims an insurance company will get. This is implemented in python using ensemble machine learning algorithms.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.