How to do optimal parameters for Random Forest in R

This recipe helps you do optimal parameters for Random Forest in R
Last Updated: 22 Dec 2022

Get access to Data Science projects View all Data Science projects

DATA SCIENCE PROJECTS IN R DATA CLEANING PYTHON DATA MUNGING MACHINE LEARNING RECIPES PANDAS CHEATSHEET ALL TAGS

Recipe Objective

How to do optimal parameters for Random Forest in R?

Random forest is a supervised learning algorithm that grows multiple decision trees and complies their results into one. It is an ensemble technique made using multiple decision models. The ensemble technique uses multiple machine learning algorithms to obtain better predictive performance. Random forest selects random parameters for the decision making i.e its adds additional randomness to the model while growing the trees. This leads to searching for the best feature among a random subset of features, which then results in a better model. Hyperparameter tuning is a process for searching for the best parameters for creating an ideal model. Tuning the model hyperparameter is very important as it directly impacts the behavior of our training model which further has a significant impact on the testing dataset. There are many different hyperparameter tuning methods available such as manual search, grid search, random search, Bayesian optimization. We are going to use tuneRF () function in this example for finding the optimal parameter for our random forest. This recipe demonstrates an example of how to do optimal parameters for Random Forest in R.

Access Text Classification using Naive Bayes Python Code

Recipe Objective

Step 1 - Install required packages

install.packages("dplyr") # Install dplyr for data manipulation library("dplyr") # Load dplyr install.packages('caret') # classification and regression training : The library caret has a function to make prediction. library(caret) install.packages('e1071', dependencies=TRUE)

Step 2 - Read the dataset

A dataset on heart disease is taken (classification problem), were predictions are to be made whether a patient has heart disease or not. The target variable is y : 'target'. class 0 : patient does not have heart disease class 1 : patient does not have heart disease

Dataset Description

age: age in years
sex: sex (1 = male; 0 = female)
cp: chest pain type

Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic

trestbps: resting blood pressure (in mm Hg on admission to the hospital)
chol: serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: resting electrocardiographic results

Value 0: normal
Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV)
Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria

thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak : ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment
ca: number of major vessels (0-3) colored by flourosopy
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
target: diagnosis of heart disease (angiographic disease status)

Value 0: < 50% diameter narrowing
Value 1: > 50% diameter narrowing

data = read.csv("http://storage.googleapis.com/dimensionless/ML_with_Python/Chapter%205/heart.csv") print(head(data)) dim(data) # returns the number of rows and columns in the dataset summary(data) # summary() function generates the statistical summary of the data

Step 3 - Split the data into train and test data sets

The training data is used for building a model, while the testing data is used for making predictions. This means after fitting a model on the training data set, finding of the errors and minimizing those error, the model is used for making predictions on the unseen data which is the test data.

split <- sample.split(data, SplitRatio = 0.8) split data_train <- subset(data, split == "TRUE") data_test <- subset(data, split == "FALSE")

Step 4 - Convert target variable to a factor form

Since are target variable is a yes/no type variable and the rest are numeric type variables, we convert target variable to a factor form in order to maintain the consistency

data$target <- as.factor(data$target) data_train$target <- as.factor(data_train$target)

Step 5 - Finding optimized parameters

We can use the tuneRF () function for finding the optimal parameter: By default, the random Forest () function uses 500 trees and randomly selected predictors as potential candidates at each split. These parameters can be adjusted by using the tuneRF () function. Syntax: tuneRF (data, target variable, stepFactor, improve, trace, plot) where, Data: the training data for building the model Target variable: the dependent variables stepFactor: It is a factor used to increase, by until the out-of-bag (OOB) estimated error stops improved by a certain amount. Improve: It is the amount that the out-of-bag (OOB) error needs to improve by keeping increasing the step factor.

bestmtry <- tuneRF(data_train,data_train$target,stepFactor = 1.2, improve = 0.01, trace=T, plot= T)

tune RF returns the best optimized value of random varaible is 3 corresponding to a OOB of 0% (OOB - prediction error)

{"mode":"full","

isActive

":false}

What Users are saying..

Ray han

Tech Leader | Stanford / Yale University

I think that they are fantastic. I attended Yale and Stanford and have worked at Honeywell,Oracle, and Arthur Andersen(Accenture) in the US. I have taken Big Data and Hadoop,NoSQL, Spark, Hadoop... Read More

Relevant Projects

Machine Learning Projects

Data Science Projects

Python Projects for Data Science

Data Science Projects in R

Machine Learning Projects for Beginners

Deep Learning Projects

Neural Network Projects

Tensorflow Projects

NLP Projects

Kaggle Projects

IoT Projects

Big Data Projects

Hadoop Real-Time Projects Examples

Spark Projects

Data Analytics Projects for Students

Relevant Projects

Medical Image Segmentation Deep Learning Project

In this deep learning project, you will learn to implement Unet++ models for medical image segmentation to detect and classify colorectal polyps.

View Project Details

Build Classification Algorithms for Digital Transformation[Banking]

Implement a machine learning approach using various classification techniques in Python to examine the digitalisation process of bank customers.

View Project Details

Learn Object Tracking (SOT, MOT) using OpenCV and Python

Get Started with Object Tracking using OpenCV and Python - Learn to implement Multiple Instance Learning Tracker (MIL) algorithm, Generic Object Tracking Using Regression Networks Tracker (GOTURN) algorithm, Kernelized Correlation Filters Tracker (KCF) algorithm, Tracking, Learning, Detection Tracker (TLD) algorithm for single and multiple object tracking from various video clips.

View Project Details

How to do optimal parameters for Random Forest in R

Recipe Objective

Table of Contents

Step 1 - Install required packages

Step 2 - Read the dataset

Dataset Description

Step 3 - Split the data into train and test data sets

Step 4 - Convert target variable to a factor form

Step 5 - Finding optimized parameters

Ray han

Relevant Projects

You might also like

Relevant Projects