How to perform ANOVA in R?

This recipe helps you perform ANOVA in R

Recipe Objective

ANOVA which is short for ANalysis Of VAriance can determine whether the means of two or more sample groups are different from each other or not. It uses F-test to statistically test equality of means. ​

ANOVA uses both between group variability and within group variability to test whether the population mens are significantly different from each other or not. ​

F-statistic is the ratio of between group variability to within group variability. Large F signifies greater dispersion. ​

Hypothesis testing with ANOVA includes the following: ​

  1. Null Hypothesis: There is no difference in the means
  2. Alternate Hypothesis: At least one pair of samples is significantly different

In this recipe, we learn how to perform one-way ANOVA test in R. ​

STEP 1: Reading the sample and hypothesis testing

Example: A study to test the effects of 3 types of fertilizer on crop yield. ​

  1. Null Hypothesis: No significantly effect on the crop yield
  2. Alternate Hypothesis: At least one pair fertilizers has a significant effect on crop yield
# data manipulation library(tidyverse) sample = read.csv("R_205_crop_sample.csv", colClasses = c("factor", "factor", "factor", "numeric"), header = TRUE) glimpse(sample)
Observations: 96
Variables: 4
$ density     1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,...
$ block       1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,...
$ fertilizer  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ yield       177.2287, 177.5500, 176.4085, 177.7036, 177.1255, 176.77...

STEP 2: Carrying out ANOVA test

We use aov() function to run the test and summary() to print the results of the model.

Syntax: aov(y ~ X1+X2+X3+..., data = )

where:

  1. y = dependent variable
  2. X1,X2,X3 = independent variables
anova_one_way = aov(yield ~ fertilizer, data = sample) summary(anova_one_way)
Df Sum Sq Mean Sq F value Pr(>F)    
fertilizer   2   6.07  3.0340   7.863  7e-04 ***
Residuals   93  35.89  0.3859                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: After checking the Pr(>F) which is the p-value of the F-statistic, we see that it's lower than 0.05. This means that atleast one pair of fertilizers used has a real impact on the final crop yield.

What Users are saying..

profile image

Jingwei Li

Graduate Research assistance at Stony Brook University
linkedin profile url

ProjectPro is an awesome platform that helps me learn much hands-on industrial experience with a step-by-step walkthrough of projects. There are two primary paths to learn: Data Science and Big Data.... Read More

Relevant Projects

Time Series Project to Build a Multiple Linear Regression Model
Learn to build a Multiple linear regression model in Python on Time Series Data

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.

Classification Projects on Machine Learning for Beginners - 1
Classification ML Project for Beginners - A Hands-On Approach to Implementing Different Types of Classification Algorithms in Machine Learning for Predictive Modelling

Build a Credit Default Risk Prediction Model with LightGBM
In this Machine Learning Project, you will build a classification model for default prediction with LightGBM.

Stock Price Prediction Project using LSTM and RNN
Learn how to predict stock prices using RNN and LSTM models. Understand deep learning concepts and apply them to real-world financial data for accurate forecasting.

Build a Review Classification Model using Gated Recurrent Unit
In this Machine Learning project, you will build a classification model in python to classify the reviews of an app on a scale of 1 to 5 using Gated Recurrent Unit.

Learn to Build a Polynomial Regression Model from Scratch
In this Machine Learning Regression project, you will learn to build a polynomial regression model to predict points scored by the sports team.

Model Deployment on GCP using Streamlit for Resume Parsing
Perform model deployment on GCP for resume parsing model using Streamlit App.

Build a Graph Based Recommendation System in Python-Part 2
In this Graph Based Recommender System Project, you will build a recommender system project for eCommerce platforms and learn to use FAISS for efficient similarity search.

Demand prediction of driver availability using multistep time series analysis
In this supervised learning machine learning project, you will predict the availability of a driver in a specific area by using multi step time series analysis.