Changes

gillesc92 · 18493960
--- a/2.-Statistics/R-squared.md
+++ b/2.-Statistics/R-squared.md
+<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
+<script type="text/javascript" id="MathJax-script" async
+  src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
+</script>
 ## R² (R-squared): Definition, Calculation, and Use in Models
 ### What is R²?
@@ -13,13 +18,24 @@ R² is a useful metric for assessing how well a model explains the relationship
 R² is calculated by comparing the total variation in the response variable to the variation explained by the model. The formula is:
-![R Squared](../images/formulas/RSquared.png)
+$$
+R^2 = 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}}
+$$
 Where:
- **SS_residual** is the sum of squared differences between the observed values and the values predicted by the model (i.e., the residuals).
+- **SS_residual** is the sum of squared differences between the observed values \(y_i\) and the values predicted by the model \( \hat{y}_i \) (i.e., the residuals).
- **SS_total** is the total sum of squared differences between the observed values and the mean of the response variable.
+$$
+SS_{\text{residual}} = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
+$$
+- **SS_total** is the total sum of squared differences between the observed values \(y_i\) and the mean of the response variable \( \bar{y} \).
-In short, R² tells you how much of the variability in the response variable is captured by the model.
+$$
+SS_{\text{total}} = \sum_{i=1}^{n} (y_i - \bar{y})^2
+$$
+In short, \(R^2\) tells you how much of the variability in the response variable is captured by the model.
 ### Interpreting R²
@@ -69,7 +85,9 @@ In data analysis, this fallacy occurs when researchers fit multiple models or te
 Adjusted R² is an alternative to R² that adjusts for the number of predictors in the model. Unlike R², which increases whenever a new predictor is added (even if it doesn’t improve the model), adjusted R² only increases if the new predictor improves the model more than would be expected by chance.
-![Adjusted R Squared](../images/formulas/AdjustedRSquared.png)
+$$
+R^2_{\text{adj}} = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)
+$$
 Where:
 - **n** is the number of observations.
@@ -84,4 +102,40 @@ To avoid falling into the Texas Sharpshooter Fallacy with R²:
 - **Use R² in context**: Remember that R² only measures how well the model fits the data used in the analysis. Always check other metrics like adjusted R² and p-values to evaluate the significance and generalizability of the model.
 - **Report all findings**: Don’t focus solely on high R² models. Even models with lower R² values may provide useful insights, particularly if they are based on a sound hypothesis and are generalizable to new data.
-By carefully interpreting R² and using it alongside other metrics, researchers can avoid overfitting and misleading conclusions, ensuring their models provide meaningful insights into the data.
+### Pseudo-R² for Generalized Linear Models (GLMs)
+In some models, such as logistic regression or other Generalized Linear Models (GLMs), the traditional R² does not apply. Instead, pseudo-R² measures are used. Here are three common types:
+#### McFadden's Pseudo-R²
+McFadden’s pseudo-R² is commonly used for logistic regression models. It is defined as:
+$$
+R^2_{\text{McFadden}} = 1 - \frac{\ln(L_{\text{full model}})}{\ln(L_{\text{null model}})}
+$$
+Where:
+- \(L_{\text{full model}}\) is the likelihood of the fitted model.
+- \(L_{\text{null model}}\) is the likelihood of the null model (a model with only an intercept).
+#### Cox & Snell's Pseudo-R²
+Cox & Snell’s pseudo-R² is another likelihood-based measure:
+$$
+R^2_{\text{Cox-Snell}} = 1 - \left( \frac{L_{\text{null model}}}{L_{\text{full model}}} \right)^{2/n}
+$$
+Where \(n\) is the number of observations.
+#### Nagelkerke's Pseudo-R²
+Nagelkerke’s pseudo-R² is a modification of Cox & Snell’s pseudo-R² that adjusts for the fact that Cox & Snell’s pseudo-R² cannot reach a maximum value of 1. The formula is:
+$$
+R^2_{\text{Nagelkerke}} = \frac{R^2_{\text{Cox-Snell}}}{1 - \left( L_{\text{null model}} \right)^{2/n}}
+$$
+Each pseudo-R² provides an indication of the model fit, with values closer to 1 indicating a better fit. However, unlike traditional R², pseudo-R² values can vary depending on the model and should be interpreted with caution.
+By carefully interpreting R², adjusted R², and pseudo-R² values, you can assess how well your models explain the variability in your data while avoiding overfitting and other common pitfalls.
\ No newline at end of file