Changes

gillesc92 · e37c1f2b
--- a/2.-Statistics/fixed-vs-random-effects.md
+++ b/2.-Statistics/fixed-vs-random-effects.md
+## 2.1.12 Fixed vs. Random Effects
+In statistical modeling, **fixed effects** and **random effects** are used to account for different types of variability in data. Understanding the distinction between these two types of effects is critical when building mixed-effects models, where both fixed and random effects can be included.
+### What are Fixed Effects?
+**Fixed effects** represent variables or factors whose levels are constant or repeatable across observations. They are used to model systematic, predictable relationships between the predictors and the response variable.
+- **Example**: In a study examining the effect of different fertilizers on plant growth, the type of fertilizer would be considered a fixed effect if the goal is to measure the effect of these specific fertilizers on growth. The levels of this factor (e.g., Fertilizer A, Fertilizer B) are the primary interest of the study.
+- **When to Use**: Use fixed effects when you are interested in estimating the effect of specific predictor levels and when those levels represent all possible outcomes of interest.
+#### Formula for Fixed Effects:
+For a linear model with fixed effects:
+$$
+y_i = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi} + \epsilon_i
+$$
+Where:
+- $y_i$ is the response for observation $i$.
+- $\beta_0$ is the intercept.
+- $\beta_1, \dots, \beta_p$ are the fixed-effect coefficients for predictors $x_1, \dots, x_p$.
+- $\epsilon_i$ is the error term (residuals).
+### What are Random Effects?
+**Random effects** capture variability due to factors that are randomly sampled from a larger population. These effects are not of primary interest but are included to account for variation across different groups or clusters.
+- **Example**: In a study where plant growth is measured across different fields, the "field" variable could be treated as a random effect if the fields represent a random sample from a larger population of fields. The goal is not to study these specific fields but to account for the variability they introduce.
+- **When to Use**: Use random effects when your data has a hierarchical or nested structure, and the levels of the factor are randomly sampled from a larger population.
+#### Formula for Random Effects:
+For a linear model with random effects:
+$$
+y_i = \beta_0 + \beta_1 x_{1i} + u_{1j} + \dots + u_{qj} + \epsilon_i
+$$
+Where:
+- $y_i$ is the response for observation $i$.
+- $\beta_0$ is the fixed effect.
+- $u_{1j}, \dots, u_{qj}$ are the random effects for groups $j = 1, \dots, q$.
+- $\epsilon_i$ is the residual error.
+Random effects ($u_j$) are usually assumed to follow a normal distribution with mean 0 and variance $\sigma^2_u$.
+### Fixed vs. Random Effects: Key Differences
+- **Focus**: Fixed effects focus on the levels of the factor being studied, while random effects focus on the variability among levels sampled from a population.
+- **Interpretation**: Fixed effects are interpreted as the effect of specific levels of a factor on the response variable. Random effects are interpreted as representing random deviations from the overall mean for each group.
+- **Modeling Goals**: Use fixed effects when you are interested in the effect of specific factor levels. Use random effects when you want to account for variability due to factors that are not the main focus of the study.
+### Mixed-Effects Models
+Mixed-effects models include both fixed and random effects. These models are commonly used in hierarchical or nested data structures, where the goal is to account for both fixed relationships and random variability.
+#### Formula for a Mixed-Effects Model:
+$$
+y_{ij} = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi} + u_{j} + \epsilon_{ij}
+$$
+Where:
+- $y_{ij}$ is the response for observation $i$ in group $j$.
+- $\beta_0$ is the fixed effect intercept.
+- $\beta_1, \dots, \beta_p$ are the fixed-effect coefficients.
+- $u_j$ is the random effect for group $j$.
+- $\epsilon_{ij}$ is the residual error.
+### Common Use Cases
+- **Hierarchical Data**: In ecological studies, mixed-effects models are often used to account for variability across different regions, sites, or time points, where these levels introduce random variation but are not of direct interest.
+- **Repeated Measures**: Mixed-effects models are frequently used in repeated-measures studies, where data is collected from the same subjects or units at multiple time points. Random effects account for within-subject variability, while fixed effects capture the treatment effects.
+- **Multilevel Modeling**: In multilevel data, where observations are nested within larger groups (e.g., students within schools, patients within hospitals), random effects help capture the variation across groups.
+### Common Issues
+- **Model Complexity**: Including random effects can complicate the model, making interpretation more difficult and increasing computation time, especially with large datasets.
+- **Overfitting**: Including too many random effects can lead to overfitting, where the model captures noise rather than the true underlying pattern.
+- **Correlation with Fixed Effects**: If random effects are correlated with fixed effects, it can lead to biased estimates. In such cases, fixed-effects models may be more appropriate.
+### Solutions to Common Issues
+1. **Model Comparison**: Use information criteria like AIC or BIC to compare models with and without random effects. This helps assess whether adding random effects improves the model fit without overcomplicating it.
+2. **Regularization**: When dealing with overfitting, consider regularization techniques such as Ridge or Lasso for fixed effects, and limit the number of random effects included in the model.
+3. **Testing for Correlation**: If you suspect random effects are correlated with fixed effects, use diagnostic plots or test models without random effects to assess the potential impact.
+### Best Practices for Using Fixed and Random Effects
+- **Specify Fixed and Random Effects Clearly**: Clearly differentiate between variables you want to measure (fixed effects) and those that introduce variability but are not of primary interest (random effects).
+- **Check Assumptions**: Ensure that the assumptions of normality and independence of random effects hold. If not, consider alternative models such as Generalized Linear Mixed Models (GLMM).
+- **Use Mixed-Effects Models for Hierarchical Data**: When your data has a hierarchical structure (e.g., nested data), use mixed-effects models to account for both fixed relationships and random variability.
+### Common Pitfalls
+- **Misclassifying Effects**: Treating random effects as fixed, or vice versa, can lead to incorrect conclusions. Be clear about whether the levels of a factor are fixed or represent a random sample from a larger population.
+- **Overfitting with Random Effects**: Including too many random effects can lead to overfitting. Always assess model fit using criteria like AIC or BIC to ensure the model remains generalizable.
+- **Ignoring Multicollinearity**: Fixed and random effects can sometimes be correlated, leading to biased results. Always check for multicollinearity between fixed and random effects.