Changes

gillesc92 · 91cc45cc
--- a/2.-Statistics/parametric-vs-nonparametric.md
+++ b/2.-Statistics/parametric-vs-nonparametric.md
+## 2.1.17 Parametric vs Non-Parametric Methods
+
+When building statistical models, one of the fundamental decisions is whether to use a **parametric** or a **non-parametric** approach. Each method has its advantages and limitations, depending on the assumptions you are willing to make about your data.
+
+### Parametric Methods
+
+**Parametric methods** rely on assumptions about the underlying distribution of the data. These models are characterized by a fixed number of parameters, which define the model’s structure. Common parametric methods include linear regression, logistic regression, and ANOVA.
+
+#### Characteristics of Parametric Methods:
+- **Fixed Parameters**: Parametric models assume a specific form for the relationship between variables, such as a linear or polynomial relationship. The number of parameters is fixed, regardless of the size of the dataset.
+- **Distribution Assumptions**: Parametric methods often assume the data follows a known distribution (e.g., normal distribution).
+- **Efficient with Small Data**: With the right distributional assumptions, parametric methods can be very efficient even with small datasets, providing powerful inferences about the population.
+
+#### Example: Linear Regression
+In linear regression, the model assumes a linear relationship between the independent variables ($X$) and the dependent variable ($Y$):
+
+$$
+Y = \beta_0 + \beta_1 X + \epsilon
+$$
+
+Where:
+- $\beta_0$ is the intercept,
+- $\beta_1$ is the slope,
+- $\epsilon$ is the error term, often assumed to be normally distributed.
+
+#### Common Use Cases
+- **Linear Models**: When you have a clear hypothesis about the form of the relationship between variables.
+- **Normally Distributed Data**: Parametric methods work well when data follows known distributions like the normal distribution.
+
+### Non-Parametric Methods
+
+**Non-parametric methods** do not assume any specific form for the distribution of the data. These models are more flexible, as they can adapt to the underlying structure of the data without relying on a predefined equation.
+
+#### Characteristics of Non-Parametric Methods:
+- **No Distribution Assumptions**: Non-parametric methods make fewer assumptions about the data distribution, making them more flexible and robust to deviations from normality.
+- **Data-Driven**: The complexity of the model increases with the size of the data. Non-parametric methods can grow more complex as more data becomes available.
+- **Useful for Small Sample Sizes**: In situations where the sample size is small and distribution assumptions cannot be verified, non-parametric methods offer a robust alternative.
+
+#### Example: The Mann-Whitney U Test
+The **Mann-Whitney U test** is a non-parametric alternative to the t-test when the normality assumption does not hold. It compares the ranks of two independent samples:
+
+$$
+U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1
+$$
+
+Where:
+- $n_1$ and $n_2$ are the sample sizes,
+- $R_1$ is the sum of ranks for the first sample.
+
+#### Common Use Cases
+- **Non-Normal Data**: When data deviates significantly from normality or other distribution assumptions.
+- **Ordinal or Ranked Data**: Non-parametric methods are particularly useful for ordinal data or when the relationship between variables is not linear.
+
+### Differences Between Parametric and Non-Parametric Methods
+
+$$
+\begin{array}{|l|c|c|}
+\hline
+\text{Aspect} & \text{Parametric Methods} & \text{Non-Parametric Methods} \\
+\hline
+\text{Assumptions} & Strong (e.g., normality) & Weak (no distribution assumptions) \\
+\hline
+\text{Flexibility} & Less flexible & More flexible \\
+\hline
+\text{Efficiency with Small Data} & High & May require more data \\
+\hline
+\text{Complexity} & Fixed complexity & Grows with data \\
+\hline
+\end{array}
+$$
+
+### Common Issues with Parametric Methods
+- **Assumption Violations**: When the underlying assumptions (e.g., normality, homoscedasticity) are violated, parametric methods can produce biased results. Non-parametric alternatives should be considered in such cases.
+  
+- **Overfitting in Complex Models**: In certain cases, using overly complex parametric models (e.g., including too many predictors) can lead to overfitting, especially with small datasets.
+
+### Common Issues with Non-Parametric Methods
+- **Inefficiency with Small Data**: Non-parametric methods typically require larger datasets to achieve the same level of efficiency as parametric models due to their flexibility.
+  
+- **Loss of Power**: Non-parametric tests often have less statistical power compared to parametric tests, meaning they may need larger sample sizes to detect a true effect.
+
+### Best Practices
+
+- **Choose Parametric Methods When Assumptions Hold**: If the data meets the assumptions of a parametric model (e.g., normality, linearity), parametric methods provide more efficient and interpretable results.
+  
+- **Use Non-Parametric Methods for Flexibility**: When the data does not meet parametric assumptions or when working with small or unusual datasets, non-parametric methods are a safer choice.
+  
+- **Verify Assumptions**: Always check if the assumptions of a parametric method hold by conducting diagnostic tests (e.g., normality tests, residual analysis). If assumptions are violated, consider switching to non-parametric methods.
+