How to perform ANOVA in R?
MACHINE LEARNING RECIPES DATA CLEANING PYTHON DATA MUNGING PANDAS CHEATSHEET     ALL TAGS

How to perform ANOVA in R?

How to perform ANOVA in R?

This recipe helps you perform ANOVA in R

0

Recipe Objective

ANOVA which is short for ANalysis Of VAriance can determine whether the means of two or more sample groups are different from each other or not. It uses F-test to statistically test equality of means. ​

ANOVA uses both between group variability and within group variability to test whether the population mens are significantly different from each other or not. ​

F-statistic is the ratio of between group variability to within group variability. Large F signifies greater dispersion. ​

Hypothesis testing with ANOVA includes the following: ​

  1. Null Hypothesis: There is no difference in the means
  2. Alternate Hypothesis: At least one pair of samples is significantly different

In this recipe, we learn how to perform one-way ANOVA test in R. ​

STEP 1: Reading the sample and hypothesis testing

Example: A study to test the effects of 3 types of fertilizer on crop yield. ​

  1. Null Hypothesis: No significantly effect on the crop yield
  2. Alternate Hypothesis: At least one pair fertilizers has a significant effect on crop yield
# data manipulation library(tidyverse) sample = read.csv("R_205_crop_sample.csv", colClasses = c("factor", "factor", "factor", "numeric"), header = TRUE) glimpse(sample)
Observations: 96
Variables: 4
$ density     1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,...
$ block       1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,...
$ fertilizer  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ yield       177.2287, 177.5500, 176.4085, 177.7036, 177.1255, 176.77...

STEP 2: Carrying out ANOVA test

We use aov() function to run the test and summary() to print the results of the model.

Syntax: aov(y ~ X1+X2+X3+..., data = )

where:

  1. y = dependent variable
  2. X1,X2,X3 = independent variables
anova_one_way = aov(yield ~ fertilizer, data = sample) summary(anova_one_way)
Df Sum Sq Mean Sq F value Pr(>F)    
fertilizer   2   6.07  3.0340   7.863  7e-04 ***
Residuals   93  35.89  0.3859                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: After checking the Pr(>F) which is the p-value of the F-statistic, we see that it's lower than 0.05. This means that atleast one pair of fertilizers used has a real impact on the final crop yield.

Relevant Projects

Machine Learning Project to Forecast Rossmann Store Sales
In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

Predict Credit Default | Give Me Some Credit Kaggle
In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

German Credit Dataset Analysis to Classify Loan Applications
In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project - Instacart Market Basket Analysis
Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

Customer Market Basket Analysis using Apriori and Fpgrowth algorithms
In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

Perform Time series modelling using Facebook Prophet
In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

PySpark Tutorial - Learn to use Apache Spark with Python
PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

Loan Eligibility Prediction in Python using H2O.ai
In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

Topic modelling using Kmeans clustering to group customer reviews
In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

Customer Churn Prediction Analysis using Ensemble Techniques
In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.