How to create a random distribution in R?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to create a random distribution in R?

How to create a random distribution in R?

This recipe helps you create a random distribution in R

0

Recipe Objective

Random numbers are generated in quite a few cases in statistics to carry out sampling and simulation. Mostly, a data scientist is in a need of a set of random numbers which are mostly taken from two types of distribution: ​

  1. Uniform distribution
  2. Normal distribultion

These random numbers generated mimic the properties of uniform or normal distribution in a certain interval. ​

In this recipe, you will learn how to create a random distribution using rnorm. ​

Note: Whenever we are generating random numbers, you are using an algorithm that requires a seed whose function is to initialise. These numbers are actually pseudorandom numbers which can be predicted if we know the seed and the generator. Setting a seed means iniltialising a pseudorandom generator. We set a seed when we need the same output of numbers everytime you want to generate random numbers. If we don't set a seed, the generated pseudorandom numbers are different on each execution. ​

Example: Creating a random distribution by generating 100 random numbers from a normal distribution by seeting a seed

We use rnorm() function to carry out this task. ​

Syntax: rnorm(n, min = , max = ) ​

where: ​

  1. n = size of the distribution
  2. min, max = specifies the interval in which you would like the distribution to be

Additionally, use set.seed() function to set a seed. We specify any integer in the function as a seed. ​

# setting a seed set.seed(20) # using random numbers from normal distribution between 1 and 30 random_dist = rnorm(10000, mean = 0, sd = 1) #plotting a histogram of the generated numbers using hist() function hist(random_dist, breaks = 100)

Note: ​

  1. The distribution remains constant even after multiple execution.
  2. You can see that the mean, mode and median co-incides in the above plot indicating a normal distribution

Relevant Projects

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Identifying Product Bundles from Sales Data Using R Language
In this data science project in R, we are going to talk about subjective segmentation which is a clustering technique to find out product bundles in sales data.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Forecast Inventory demand using historical sales data in R
In this machine learning project, you will develop a machine learning model to accurately forecast inventory demand based on historical sales data.

Mercari Price Suggestion Challenge Data Science Project
Data Science Project in Python- Build a machine learning algorithm that automatically suggests the right product prices.

Build a Similar Images Finder with Python, Keras, and Tensorflow
Build your own image similarity application using Python to search and find images of products that are similar to any given product. You will implement the K-Nearest Neighbor algorithm to find products with maximum similarity.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.

Human Activity Recognition Using Multiclass Classification in Python
In this human activity recognition project, we use multiclass classification machine learning techniques to analyse fitness dataset from a smartphone tracker.

Predict Churn for a Telecom company using Logistic Regression
Machine Learning Project in R- Predict the customer churn of telecom sector and find out the key drivers that lead to churn. Learn how the logistic regression model using R can be used to identify the customer churn in telecom dataset.