Changes

gillesc92 · 47118e2d
--- a/2.-Statistics/F-value.md
+++ b/2.-Statistics/F-value.md
+## F-value: Definition, Calculation, and Use in Models
+
+### What is the F-value?
+
+The **F-value** is a test statistic used in ANOVA and regression models to assess whether the model is statistically significant. It compares the variance explained by the model (signal) to the unexplained variance (noise). A high F-value indicates that the model explains a significant amount of the variance, while a low F-value suggests that the model doesn't improve much over using the mean of the outcome.
+
+The F-value is calculated as:
+
+$F = \frac{\text{MSB}}{\text{MSW}}$
+
+Where:
+- **MSB (Mean Square Between)**: The variance explained by the model.
+- **MSW (Mean Square Within)**: The residual variance, or the variance that remains unexplained.
+
+### Interpretation
+
+The F-value tests whether the model provides a better fit than using just the mean. Visually, a high F-value indicates that the data points are close to the regression line, showing that the model fits well. A low F-value suggests that the data points are scattered, indicating a poor fit.
+
+In **simple regression** (one predictor), the F-value is related to the t-test, with the F-value being the square of the t-value ($F = t^2$). In this case, both tests give the same information about model significance. In **multiple regression**, the F-value tests the overall model significance, while t-tests assess the individual predictors.
+
+### When to Use the F-value
+
+- **Multiple Regression**: When testing whether the predictors, as a group, significantly explain the variance in the outcome.
+- **ANOVA**: When comparing group means to check for statistically significant differences.
+
+### Example (Good Practice)
+
+Suppose you are modeling plant growth based on factors like sunlight, water, and fertilizer. The F-value tests whether these predictors, collectively, explain the variation in plant growth. A large F-value indicates that the model is a good fit.
+
+### Example (Bad Practice)
+
+- **Texas Sharpshooter Fallacy**: Occurs when a researcher looks for patterns after data collection, then reports only significant F-values by chance. This can lead to **p-hacking**, where multiple tests are conducted, but only the significant results are presented.
+  
+- **Incorrect Use of ANOVA**: Applying ANOVA to non-normal data or data with unequal group variances can lead to misleading results. For example, using ANOVA without checking assumptions like homogeneity of variance may produce a biased F-value.
+
+### Common Pitfalls
+
+- **Overfitting**: Including too many predictors can inflate the F-value, making the model appear more significant than it really is, leading to poor generalization.
+- **Assumption Violations**: The F-test assumes that:
+  - **Residuals are normally distributed**
+  - **Homogeneity of variance** (equal variances across groups)
+
+  Violating these assumptions doesn't always invalidate your results, but it can affect the accuracy of the F-test. For example:
+  - **Mild violations of normality**: The F-test can be robust to slight deviations from normality, especially in large sample sizes.
+  - **Homogeneity of variance**: Unequal variances between groups (heteroscedasticity) can lead to an inflated F-value, increasing the chance of a Type I error (false positive). In such cases, transformations of the data or alternative tests like Welch's ANOVA can be applied.
+
+### Interpreting the F-value
+
+- **High F-value**: Indicates that the model explains a significant amount of variance.
+- **Low F-value**: Suggests that the model doesn't explain much variance.
+
+The F-value is compared to a critical value from an F-distribution table. If the F-value is greater than the critical value, the model is considered statistically significant.
+
+### Related Measures
+
+- **p-value**: The p-value indicates whether the F-value is statistically significant. A small p-value (typically < 0.05) suggests that the F-value is significant.
\ No newline at end of file