... | @@ -8,41 +8,41 @@ The **F-value** is a test statistic used in ANOVA and regression models to asses |
... | @@ -8,41 +8,41 @@ The **F-value** is a test statistic used in ANOVA and regression models to asses |
|
|
|
|
|
To calculate the F-value, you need to partition the total variance in the outcome variable. This total variance is known as the **Total Sum of Squares (SST)**, and it is calculated by taking the squared differences between the observed data points and the overall mean of the outcome:
|
|
To calculate the F-value, you need to partition the total variance in the outcome variable. This total variance is known as the **Total Sum of Squares (SST)**, and it is calculated by taking the squared differences between the observed data points and the overall mean of the outcome:
|
|
|
|
|
|
\[
|
|
$$
|
|
SST = \sum (Y_i - \bar{Y})^2
|
|
SST = \sum (y_i - \bar{y})^2
|
|
\]
|
|
$$
|
|
|
|
|
|
Where:
|
|
Where:
|
|
- $Y_i$: The observed value for each data point
|
|
- $y_i$: The observed value for each data point
|
|
- $\bar{Y}$: The mean of the outcome variable
|
|
- $\bar{y}$: The mean of the outcome variable
|
|
|
|
|
|
The total variance is then split into two components:
|
|
The total variance is then split into two components:
|
|
|
|
|
|
1. **Explained Variance (Regression Sum of Squares - SSR)**: This is the portion of variance explained by the model. It is the difference between the predicted values from the model and the overall mean of the outcome variable:
|
|
1. **Explained Variance (Regression Sum of Squares - SSR)**: This is the portion of variance explained by the model. It is the difference between the predicted values from the model and the overall mean of the outcome variable:
|
|
|
|
|
|
\[
|
|
$$
|
|
SSR = \sum (\hat{Y}_i - \bar{Y})^2
|
|
SSR = \sum (\hat{y}_i - \bar{y})^2
|
|
\]
|
|
$$
|
|
|
|
|
|
Where:
|
|
Where:
|
|
- $\hat{Y}_i$: The predicted value from the model for each data point
|
|
- $\hat{y}_i$: The predicted value from the model for each data point
|
|
- $\bar{Y}$: The mean of the outcome variable
|
|
- $\bar{y}$: The mean of the outcome variable
|
|
|
|
|
|
2. **Unexplained Variance (Residual Sum of Squares - SSE)**: This is the variance that remains unexplained by the model, which measures how far the observed values differ from the predicted values:
|
|
2. **Unexplained Variance (Residual Sum of Squares - SSE)**: This is the variance that remains unexplained by the model, which measures how far the observed values differ from the predicted values:
|
|
|
|
|
|
\[
|
|
$$
|
|
SSE = \sum (Y_i - \hat{Y}_i)^2
|
|
SSE = \sum (y_i - \hat{y}_i)^2
|
|
\]
|
|
$$
|
|
|
|
|
|
Where:
|
|
Where:
|
|
- $Y_i$: The observed value for each data point
|
|
- $y_i$: The observed value for each data point
|
|
- $\hat{Y}_i$: The predicted value from the model
|
|
- $\hat{y}_i$: The predicted value from the model
|
|
|
|
|
|
Next, you calculate the **Mean Squares** for both the explained and unexplained variances. These values account for the number of predictors and observations in the model. The **Mean Square Between (MSB)**, representing the explained variance, is calculated by dividing the **SSR** by the number of predictors ($p$):
|
|
Next, you calculate the **Mean Squares** for both the explained and unexplained variances. These values account for the number of predictors and observations in the model. The **Mean Square Between (MSB)**, representing the explained variance, is calculated by dividing the **SSR** by the number of predictors ($p$):
|
|
|
|
|
|
\[
|
|
$$
|
|
MSB = \frac{SSR}{p}
|
|
MSB = \frac{SSR}{p}
|
|
\]
|
|
$$
|
|
|
|
|
|
Where:
|
|
Where:
|
|
- $SSR$: The regression sum of squares (explained variance)
|
|
- $SSR$: The regression sum of squares (explained variance)
|
... | @@ -50,9 +50,9 @@ Where: |
... | @@ -50,9 +50,9 @@ Where: |
|
|
|
|
|
The **Mean Square Within (MSW)**, representing the unexplained variance, is calculated by dividing the **SSE** by the degrees of freedom (the number of observations $n$, minus the number of predictors $p$, and minus one):
|
|
The **Mean Square Within (MSW)**, representing the unexplained variance, is calculated by dividing the **SSE** by the degrees of freedom (the number of observations $n$, minus the number of predictors $p$, and minus one):
|
|
|
|
|
|
\[
|
|
$$
|
|
MSW = \frac{SSE}{n - p - 1}
|
|
MSW = \frac{SSE}{n - p - 1}
|
|
\]
|
|
$$
|
|
|
|
|
|
Where:
|
|
Where:
|
|
- $SSE$: The residual sum of squares (unexplained variance)
|
|
- $SSE$: The residual sum of squares (unexplained variance)
|
... | @@ -61,9 +61,9 @@ Where: |
... | @@ -61,9 +61,9 @@ Where: |
|
|
|
|
|
Finally, the **F-value** is calculated by taking the ratio of the explained variance (MSB) to the unexplained variance (MSW):
|
|
Finally, the **F-value** is calculated by taking the ratio of the explained variance (MSB) to the unexplained variance (MSW):
|
|
|
|
|
|
\[
|
|
$$
|
|
F = \frac{MSB}{MSW}
|
|
F = \frac{MSB}{MSW}
|
|
\]
|
|
$$
|
|
|
|
|
|
This ratio indicates how much more variance is explained by the model compared to the variance that remains unexplained. A large F-value suggests that the model explains much more variance than what is left unexplained, meaning the model is statistically significant. Conversely, a small F-value indicates that the model is not significantly better than using the mean to predict the outcome.
|
|
This ratio indicates how much more variance is explained by the model compared to the variance that remains unexplained. A large F-value suggests that the model explains much more variance than what is left unexplained, meaning the model is statistically significant. Conversely, a small F-value indicates that the model is not significantly better than using the mean to predict the outcome.
|
|
|
|
|
... | | ... | |