Changes

gillesc92 · eeed60e6
--- a/2.-Statistics/RSE.md
+++ b/2.-Statistics/RSE.md
+## Residual Standard Error (RSE): Definition, Calculation, and Use in Models
+
+### What is Residual Standard Error (RSE)?
+
+**Residual Standard Error (RSE)** is a measure of the typical size of the residuals in a regression model, representing the average amount by which the observed values deviate from the predicted values. It essentially tells you how far off your model’s predictions are from the actual values, on average, in the same units as the response variable.
+
+RSE is closely related to **Mean Squared Error (MSE)**, but it adjusts for the degrees of freedom in the model, making it a more accurate measure of model fit, especially when there are many predictors.
+
+### How is RSE Calculated?
+
+The formula for RSE is:
+
+$$
+\text{RSE} = \sqrt{\frac{1}{n - p - 1} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
+$$
+
+Where:
+- **$y_i$** are the observed values,
+- **$\hat{y}_i$** are the predicted values from the model,
+- **$n$** is the number of observations,
+- **$p$** is the number of predictors in the model.
+
+The adjustment by **$n - p - 1$** accounts for the number of predictors and ensures that RSE reflects the uncertainty due to the estimation of multiple parameters in the model.
+
+### Interpreting RSE
+
+- **Lower RSE**: Indicates that the model’s predictions are close to the observed data, meaning a better fit.
+- **Higher RSE**: Suggests that the model’s predictions deviate substantially from the actual values, indicating a poorer fit.
+
+RSE is reported in the same units as the dependent variable, making it easy to interpret in practical terms.
+
+### Common Use Cases: RSE
+
+#### 1. **Assessing Model Fit in Regression**
+
+RSE is a key measure for evaluating the overall fit of a regression model. It gives an indication of how well the model explains the data by looking at the size of the residuals.
+
+##### Example: Predicting Tree Growth
+
+In a model predicting tree growth (measured in centimeters), RSE provides the average error in the model’s predictions of tree height. A lower RSE means the model’s predictions are more accurate.
+
+#### 2. **Comparing Models with Different Numbers of Predictors**
+
+Because RSE adjusts for the number of predictors, it can be used to compare models with different levels of complexity. This helps ensure that adding more variables to a model does not artificially reduce the residual error without actually improving the model’s accuracy.
+
+##### Example: Adding More Predictors to a Species Distribution Model
+
+When building a species distribution model, you might start with basic environmental predictors (e.g., temperature, rainfall) and later add more complex predictors (e.g., land use, elevation). RSE can help determine whether the added predictors genuinely improve the model fit.
+
+### Issues with RSE
+
+#### 1. **Influence of Outliers**
+
+RSE, like MSE, is sensitive to outliers because it is based on the squared differences between observed and predicted values. A single outlier can disproportionately inflate the residual error, making the model appear to fit worse than it actually does.
+
+- **Fix**: Check for outliers using diagnostic plots, and consider using **robust regression techniques** if outliers are present. Alternatively, transform the data or investigate why outliers occur.
+
+#### 2. **Overfitting with Too Many Predictors**
+
+Adding too many predictors to a model can lead to **overfitting**, where the model fits the training data very well (leading to a lower RSE) but performs poorly on new data. This happens when the model captures noise rather than the true relationship between variables.
+
+- **Fix**: Use cross-validation to test model performance on unseen data and avoid overfitting. Simplify the model by removing unnecessary predictors, and use regularization techniques like **Lasso** or **Ridge Regression** to control for overfitting.
+
+#### 3. **RSE is Not a Standalone Measure of Model Quality**
+
+While RSE gives a sense of how much prediction error exists, it does not indicate whether the model is explaining the data well compared to a null model. Measures like **R²** or **Adjusted R²** should also be considered to get a complete picture of model quality.
+
+- **Fix**: Use RSE in conjunction with **R²** and **Adjusted R²** to evaluate model performance comprehensively.
+
+### Related Measures
+
+#### 1. **Root Mean Squared Error (RMSE)**
+
+**RMSE** is similar to RSE but does not adjust for the number of predictors. It measures the average error of the model in the same units as the response variable. RMSE is commonly used in machine learning models for evaluating prediction accuracy.
+
+$$
+\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
+$$
+
+#### 2. **R² (Coefficient of Determination)**
+
+**R²** measures the proportion of variance in the dependent variable that is explained by the predictors in the model. It complements RSE by indicating how much of the data variability is captured by the model, while RSE focuses on prediction error.
+
+#### 3. **Mean Squared Error (MSE)**
+
+**MSE** is the precursor to RSE. It measures the average squared difference between observed and predicted values without adjusting for the number of predictors. MSE is often used in model evaluation but is less robust than RSE when dealing with multiple predictors.
+
+---
+
+### How to Use RSE Effectively
+
+- **Assess Model Fit**: Use RSE to evaluate how well your regression model is fitting the data, especially when comparing models with different numbers of predictors.
+- **Monitor for Overfitting**: Be cautious of models with low RSE values that may be overfitting, especially if the number of predictors is high. Use cross-validation and regularization to prevent this.
+- **Complement with Other Measures**: Use RSE alongside metrics like **R²** and **Adjusted R²** for a comprehensive assessment of model performance.