How to perform ANOVA in R?

This recipe helps you perform ANOVA in R

Recipe Objective

ANOVA which is short for ANalysis Of VAriance can determine whether the means of two or more sample groups are different from each other or not. It uses F-test to statistically test equality of means. ​

ANOVA uses both between group variability and within group variability to test whether the population mens are significantly different from each other or not. ​

F-statistic is the ratio of between group variability to within group variability. Large F signifies greater dispersion. ​

Hypothesis testing with ANOVA includes the following: ​

  1. Null Hypothesis: There is no difference in the means
  2. Alternate Hypothesis: At least one pair of samples is significantly different

In this recipe, we learn how to perform one-way ANOVA test in R. ​

STEP 1: Reading the sample and hypothesis testing

Example: A study to test the effects of 3 types of fertilizer on crop yield. ​

  1. Null Hypothesis: No significantly effect on the crop yield
  2. Alternate Hypothesis: At least one pair fertilizers has a significant effect on crop yield
# data manipulation library(tidyverse) sample = read.csv("R_205_crop_sample.csv", colClasses = c("factor", "factor", "factor", "numeric"), header = TRUE) glimpse(sample)
Observations: 96
Variables: 4
$ density     1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,...
$ block       1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,...
$ fertilizer  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ yield       177.2287, 177.5500, 176.4085, 177.7036, 177.1255, 176.77...

STEP 2: Carrying out ANOVA test

We use aov() function to run the test and summary() to print the results of the model.

Syntax: aov(y ~ X1+X2+X3+..., data = )

where:

  1. y = dependent variable
  2. X1,X2,X3 = independent variables
anova_one_way = aov(yield ~ fertilizer, data = sample) summary(anova_one_way)
Df Sum Sq Mean Sq F value Pr(>F)    
fertilizer   2   6.07  3.0340   7.863  7e-04 ***
Residuals   93  35.89  0.3859                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: After checking the Pr(>F) which is the p-value of the F-statistic, we see that it's lower than 0.05. This means that atleast one pair of fertilizers used has a real impact on the final crop yield.

What Users are saying..

profile image

Ed Godalle

Director Data Analytics at EY / EY Tech
linkedin profile url

I am the Director of Data Analytics with over 10+ years of IT experience. I have a background in SQL, Python, and Big Data working with Accenture, IBM, and Infosys. I am looking to enhance my skills... Read More

Relevant Projects

Isolation Forest Model and LOF for Anomaly Detection in Python
Credit Card Fraud Detection Project - Build an Isolation Forest Model and Local Outlier Factor (LOF) in Python to identify fraudulent credit card transactions.

Hands-On Approach to Causal Inference in Machine Learning
In this Machine Learning Project, you will learn to implement various causal inference techniques in Python to determine, how effective the sprinkler is in making the grass wet.

NLP Project for Multi Class Text Classification using BERT Model
In this NLP Project, you will learn how to build a multi-class text classification model using using the pre-trained BERT model.

Build Piecewise and Spline Regression Models in Python
In this Regression Project, you will learn how to build a piecewise and spline regression model from scratch in Python to predict the points scored by a sports team.

End-to-End ML Model Monitoring using Airflow and Docker
In this MLOps Project, you will learn to build an end to end pipeline to monitor any changes in the predictive power of model or degradation of data.

Build a Logistic Regression Model in Python from Scratch
Regression project to implement logistic regression in python from scratch on streaming app data.

Learn to Build Generative Models Using PyTorch Autoencoders
In this deep learning project, you will learn how to build a Generative Model using Autoencoders in PyTorch

Build Portfolio Optimization Machine Learning Models in R
Machine Learning Project for Financial Risk Modelling and Portfolio Optimization with R- Build a machine learning model in R to develop a strategy for building a portfolio for maximized returns.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.

Personalized Medicine: Redefining Cancer Treatment
In this Personalized Medicine Machine Learning Project you will learn to classify genetic mutations on the basis of medical literature into 9 classes.