Changes

gillesc92 · 24dbb700
--- a/2.-Statistics/model-evaluation-metrics.md
+++ b/2.-Statistics/model-evaluation-metrics.md
+## 2.1.11 Model Evaluation Metrics
+
+Model evaluation metrics are used to assess the performance of a model and determine how well it generalizes to new data. These metrics provide insight into how accurately a model predicts outcomes, whether it suffers from overfitting or underfitting, and its ability to handle various types of data.
+
+### Why Use Model Evaluation Metrics?
+
+- **Assess Model Performance**: Metrics help determine how well a model fits the data and predict new observations.
+- **Compare Models**: Different models can be evaluated and compared using common metrics, guiding the selection of the best-performing model.
+- **Detect Overfitting/Underfitting**: Evaluation metrics can indicate whether the model is too complex (overfitting) or too simple (underfitting).
+
+### Common Model Evaluation Metrics
+
+#### 1. **Mean Squared Error (MSE)**
+- **How it works**: MSE measures the average squared difference between the observed and predicted values. Lower MSE indicates a better fit.
+  
+$$
+MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
+$$
+Where:
+- $y_i$ is the observed value.
+- $\hat{y}_i$ is the predicted value.
+
+- **Use case**: MSE is widely used for regression models. It penalizes larger errors more than smaller ones due to squaring the differences.
+
+#### 2. **Root Mean Squared Error (RMSE)**
+- **How it works**: RMSE is the square root of MSE, providing a metric in the same units as the response variable. It is easier to interpret than MSE for comparing predictions.
+  
+$$
+RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
+$$
+
+- **Use case**: RMSE is preferred when interpreting prediction error in the same units as the data.
+
+#### 3. **Mean Absolute Error (MAE)**
+- **How it works**: MAE measures the average absolute difference between observed and predicted values. Unlike MSE, it treats all errors equally without penalizing large errors more heavily.
+  
+$$
+MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
+$$
+
+- **Use case**: MAE is less sensitive to outliers than MSE, making it suitable for datasets with extreme values.
+
+#### 4. **R² (R-squared)**
+- **How it works**: R² measures the proportion of variance in the dependent variable explained by the independent variables in the model. It ranges from 0 to 1, where values closer to 1 indicate a better fit.
+  
+$$
+R^2 = 1 - \frac{SS_{\text{residual}}}{SS_{\text{total}}}
+$$
+
+Where:
+- $SS_{\text{residual}} = \sum (y_i - \hat{y}_i)^2$ is the sum of squared residuals.
+- $SS_{\text{total}} = \sum (y_i - \bar{y})^2$ is the total sum of squares.
+
+- **Use case**: R² is used to evaluate the goodness of fit for regression models. It tells how much variance in the response variable is captured by the model.
+
+#### 5. **Adjusted R²**
+- **How it works**: Adjusted R² modifies R² to account for the number of predictors, providing a more accurate measure when comparing models with different numbers of predictors.
+  
+$$
+R^2_{\text{adj}} = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - p - 1} \right)
+$$
+
+Where:
+- $n$ is the number of observations.
+- $p$ is the number of predictors.
+
+- **Use case**: Use Adjusted R² to compare models when the number of predictors differs, as it penalizes unnecessary predictors.
+
+#### 6. **Akaike Information Criterion (AIC)**
+- **How it works**: AIC balances goodness of fit with model complexity. Lower AIC values indicate better models by penalizing those with more parameters.
+  
+$$
+AIC = 2k - 2\ln(L)
+$$
+
+Where:
+- $k$ is the number of parameters.
+- $L$ is the likelihood of the model.
+
+- **Use case**: AIC is commonly used in model selection, helping avoid overfitting by penalizing models with more parameters.
+
+#### 7. **Bayesian Information Criterion (BIC)**
+- **How it works**: Like AIC, BIC penalizes complex models, but more heavily for small sample sizes. Lower BIC values indicate a better model.
+  
+$$
+BIC = \ln(n)k - 2\ln(L)
+$$
+
+Where:
+- $n$ is the number of observations.
+- $k$ is the number of parameters.
+
+- **Use case**: BIC is often used when model simplicity is preferred, as it imposes stricter penalties on complexity than AIC.
+
+### Classification-Specific Metrics
+
+#### 1. **Accuracy**
+- **How it works**: Accuracy measures the proportion of correctly classified instances in classification models.
+  
+$$
+\text{Accuracy} = \frac{\text{True Positives} + \text{True Negatives}}{\text{Total Observations}}
+$$
+
+- **Use case**: Accuracy is the most common metric for classification models, but it can be misleading for imbalanced datasets.
+
+#### 2. **Precision, Recall, and F1-Score**
+- **Precision**: Measures the proportion of true positives among all predicted positives.
+  
+$$
+\text{Precision} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Positives}}
+$$
+
+- **Recall (Sensitivity)**: Measures the proportion of true positives among all actual positives.
+  
+$$
+\text{Recall} = \frac{\text{True Positives}}{\text{True Positives} + \text{False Negatives}}
+$$
+
+- **F1-Score**: The harmonic mean of precision and recall, used when you want a balance between the two metrics.
+  
+$$
+F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
+$$
+
+- **Use case**: Precision, recall, and F1-score are essential for classification tasks, especially when dealing with imbalanced datasets.
+
+#### 3. **Confusion Matrix**
+- **How it works**: A confusion matrix provides a detailed breakdown of model performance by showing true positives, false positives, true negatives, and false negatives.
+
+- **Use case**: Useful for visualizing the performance of classification models, particularly when precision, recall, or misclassification rates need to be examined.
+
+### Common Issues
+
+- **Overfitting**: Overfitting occurs when a model performs well on training data but poorly on unseen data. Use cross-validation and metrics like AIC or BIC to assess whether the model is too complex.
+  
+- **Imbalanced Datasets**: In classification tasks, imbalanced datasets can lead to misleading accuracy scores. Precision, recall, and F1-score are better suited for such cases.
+
+- **Ignoring Assumptions**: Many metrics, such as R² and MSE, assume that residuals are homoscedastic and normally distributed. Ignoring these assumptions can lead to incorrect interpretations of model performance.
+
+### Best Practices for Model Evaluation
+
+- **Use Cross-Validation**: Cross-validation ensures that your model generalizes well to new data and reduces the risk of overfitting.
+  
+- **Evaluate Multiple Metrics**: Always assess model performance using a variety of metrics (e.g., RMSE, MAE, AIC) to get a complete picture of how well your model fits the data.
+
+- **Check for Assumptions**: Make sure that the assumptions underlying your model (e.g., normality, independence) hold before interpreting the results.
+
+- **Use Domain-Specific Metrics**: When working with classification or time-series models, use domain-specific metrics like precision, recall, or AIC/BIC as appropriate.
+
+### Common Pitfalls
+
+- **Relying Solely on Accuracy**: Accuracy alone can be misleading, especially for imbalanced datasets. Always consider metrics like precision, recall, and F1-score in such cases.
+  
+- **Ignoring Overfitting**: Overfitting leads to poor model generalization. Regularization and cross-validation are key techniques to avoid this issue.
+  
+- **Overemphasis on R²**: A high R² does not always imply a good model. Always check for overfitting, and use metrics like Adjusted R², AIC, and BIC to balance complexity and performance.
+