MACHINE LEARNING RECIPES
DATA CLEANING PYTHON
DATA MUNGING
PANDAS CHEATSHEET
ALL TAGS
# How to perform ANOVA in R?

# How to perform ANOVA in R?

This recipe helps you perform ANOVA in R

ANOVA which is short for ANalysis Of VAriance can determine whether the means of two or more sample groups are different from each other or not. It uses F-test to statistically test equality of means.

ANOVA uses both between group variability and within group variability to test whether the population mens are significantly different from each other or not.

F-statistic is the ratio of between group variability to within group variability. Large F signifies greater dispersion.

Hypothesis testing with ANOVA includes the following:

- Null Hypothesis: There is no difference in the means
- Alternate Hypothesis: At least one pair of samples is significantly different

In this recipe, we learn how to perform one-way ANOVA test in R.

Example: A study to test the effects of 3 types of fertilizer on crop yield.

- Null Hypothesis: No significantly effect on the crop yield
- Alternate Hypothesis: At least one pair fertilizers has a significant effect on crop yield

```
# data manipulation
library(tidyverse)
sample = read.csv("R_205_crop_sample.csv", colClasses = c("factor", "factor", "factor", "numeric"), header = TRUE)
glimpse(sample)
```

Observations: 96 Variables: 4 $ density1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,... $ block 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,... $ fertilizer 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,... $ yield 177.2287, 177.5500, 176.4085, 177.7036, 177.1255, 176.77...

We use aov() function to run the test and summary() to print the results of the model.

Syntax: aov(y ~ X1+X2+X3+..., data = )

where:

- y = dependent variable
- X1,X2,X3 = independent variables

```
anova_one_way = aov(yield ~ fertilizer, data = sample)
summary(anova_one_way)
```

Df Sum Sq Mean Sq F value Pr(>F) fertilizer 2 6.07 3.0340 7.863 7e-04 *** Residuals 93 35.89 0.3859 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Result: After checking the Pr(>F) which is the p-value of the F-statistic, we see that it's lower than 0.05. This means that atleast one pair of fertilizers used has a real impact on the final crop yield.

In this machine learning project you will work on creating a robust prediction model of Rossmann's daily sales using store, promotion, and competitor data.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

In this data science project, you will work with German credit dataset using classification techniques like Decision Tree, Neural Networks etc to classify loan applications using R.

Data Science Project - Build a recommendation engine which will predict the products to be purchased by an Instacart consumer again.

In this data science project, you will learn how to perform market basket analysis with the application of Apriori and FP growth algorithms based on the concept of association rule learning.

In this project, we are going to talk about Time Series Forecasting to predict the electricity requirement for a particular house using Prophet.

PySpark Project-Get a handle on using Python with Spark through this hands-on data processing spark python tutorial.

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

In this Kmeans clustering machine learning project, you will perform topic modelling in order to group customer reviews based on recurring patterns.

In this machine learning churn project, we implement a churn prediction model in python using ensemble techniques.