Changes

gillesc92 · 40d324ad
--- a/2.-Statistics/Log-Likelihood.md
+++ b/2.-Statistics/Log-Likelihood.md
+## Log-Likelihood: Definition, Calculation, and Use in Models
+### What is Log-Likelihood?
+**Log-Likelihood** is a measure of how well a statistical model fits a set of observations. It is based on the likelihood function, which represents the probability of the observed data given the parameters of the model. The Log-Likelihood is the natural logarithm of the likelihood function and is used in **Maximum Likelihood Estimation (MLE)** to find the model parameters that best fit the data.
+A higher Log-Likelihood value indicates a better fit, while a lower Log-Likelihood suggests the model does not explain the data well.
+### How is Log-Likelihood Calculated?
+The Log-Likelihood is calculated as:
+$$
+\ell(\theta) = \ln(L(\theta)) = \sum_{i=1}^{n} \ln(f(y_i | \theta))
+$$
+Where:
+- **$\ell(\theta)$** is the Log-Likelihood function,
+- **$L(\theta)$** is the likelihood function,
+- **$f(y_i | \theta)$** is the probability density or mass function for the observed data point $y_i$ given model parameters $\theta$,
+- **$n$** is the number of observations.
+### Interpreting Log-Likelihood
+- **Higher Log-Likelihood**: Indicates a better model fit.
+- **Lower Log-Likelihood**: Suggests the model does not explain the data well.
+However, Log-Likelihood values should not be compared across different datasets or models with different numbers of parameters. For model comparison, additional metrics like **AIC** and **BIC** are necessary.
+### Common Use Cases: Log-Likelihood
+#### 1. **Model Fitting in Maximum Likelihood Estimation (MLE)**
+Log-Likelihood is the cornerstone of **Maximum Likelihood Estimation (MLE)**, where the goal is to find the parameter values that maximize the Log-Likelihood, yielding the best-fitting model.
+##### Example: Logistic Regression Model
+You might use Log-Likelihood to fit a logistic regression model that predicts the probability of species survival based on environmental factors. The regression coefficients that maximize the Log-Likelihood provide the best-fitting model for this data.
+#### 2. **Generalized Linear Models (GLMs)**
+In GLMs, the Log-Likelihood helps estimate model parameters that best describe the relationship between independent variables and a dependent variable.
+##### Example: Modeling Species Abundance
+When using Poisson regression to model species abundance based on environmental factors, maximizing the Log-Likelihood ensures the model fits the data as well as possible.
+#### 3. **Model Comparison**
+Log-Likelihood values are used to compare nested models (where one model is a simpler version of another). A **likelihood ratio test** evaluates whether the more complex model significantly improves the fit to the data.
+### Issues with Log-Likelihood
+#### 1. **Overfitting**
+Maximizing Log-Likelihood alone can lead to **overfitting**, where the model fits the training data well but generalizes poorly to new data. This can happen if too many parameters are added to the model.
+- **Fix**: Use model selection criteria like **AIC** or **BIC**, which penalize models with excessive parameters.
+#### 2. **Sensitivity to Outliers**
+Outliers can distort the Log-Likelihood, leading to poor model performance if they are not properly accounted for.
+- **Fix**: Use robust methods or transformations to handle outliers effectively.
+### Related Measures
+#### 1. **Akaike Information Criterion (AIC)**
+**AIC** is a widely used metric for model selection that balances goodness-of-fit and model complexity. While the Log-Likelihood indicates how well the model fits the data, AIC penalizes models with more parameters, helping to avoid overfitting.
+#### 2. **Bayesian Information Criterion (BIC)**
+**BIC** is similar to AIC but imposes a stronger penalty for models with a large number of parameters, especially in larger datasets. BIC tends to favor simpler models compared to AIC, making it a more conservative measure for model selection.
+### How to Use Log-Likelihood and Related Measures Effectively
+When fitting models, maximizing Log-Likelihood is key to finding the best-fitting parameters. However, to prevent overfitting, use AIC or BIC for model comparison and selection. These related measures ensure that the selected model balances fit and complexity, improving generalizability.