|
|
## 2.1.17 Parametric vs Non-Parametric Methods
|
|
|
|
|
|
When building statistical models, one of the fundamental decisions is whether to use a **parametric** or a **non-parametric** approach. Each method has its advantages and limitations, depending on the assumptions you are willing to make about your data.
|
|
|
|
|
|
### Parametric Methods
|
|
|
|
|
|
**Parametric methods** rely on assumptions about the underlying distribution of the data. These models are characterized by a fixed number of parameters, which define the model’s structure. Common parametric methods include linear regression, logistic regression, and ANOVA.
|
|
|
|
|
|
#### Characteristics of Parametric Methods:
|
|
|
- **Fixed Parameters**: Parametric models assume a specific form for the relationship between variables, such as a linear or polynomial relationship. The number of parameters is fixed, regardless of the size of the dataset.
|
|
|
- **Distribution Assumptions**: Parametric methods often assume the data follows a known distribution (e.g., normal distribution).
|
|
|
- **Efficient with Small Data**: With the right distributional assumptions, parametric methods can be very efficient even with small datasets, providing powerful inferences about the population.
|
|
|
|
|
|
#### Example: Linear Regression
|
|
|
In linear regression, the model assumes a linear relationship between the independent variables ($X$) and the dependent variable ($Y$):
|
|
|
|
|
|
$$
|
|
|
Y = \beta_0 + \beta_1 X + \epsilon
|
|
|
$$
|
|
|
|
|
|
Where:
|
|
|
- $\beta_0$ is the intercept,
|
|
|
- $\beta_1$ is the slope,
|
|
|
- $\epsilon$ is the error term, often assumed to be normally distributed.
|
|
|
|
|
|
#### Common Use Cases
|
|
|
- **Linear Models**: When you have a clear hypothesis about the form of the relationship between variables.
|
|
|
- **Normally Distributed Data**: Parametric methods work well when data follows known distributions like the normal distribution.
|
|
|
|
|
|
### Non-Parametric Methods
|
|
|
|
|
|
**Non-parametric methods** do not assume any specific form for the distribution of the data. These models are more flexible, as they can adapt to the underlying structure of the data without relying on a predefined equation.
|
|
|
|
|
|
#### Characteristics of Non-Parametric Methods:
|
|
|
- **No Distribution Assumptions**: Non-parametric methods make fewer assumptions about the data distribution, making them more flexible and robust to deviations from normality.
|
|
|
- **Data-Driven**: The complexity of the model increases with the size of the data. Non-parametric methods can grow more complex as more data becomes available.
|
|
|
- **Useful for Small Sample Sizes**: In situations where the sample size is small and distribution assumptions cannot be verified, non-parametric methods offer a robust alternative.
|
|
|
|
|
|
#### Example: The Mann-Whitney U Test
|
|
|
The **Mann-Whitney U test** is a non-parametric alternative to the t-test when the normality assumption does not hold. It compares the ranks of two independent samples:
|
|
|
|
|
|
$$
|
|
|
U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1
|
|
|
$$
|
|
|
|
|
|
Where:
|
|
|
- $n_1$ and $n_2$ are the sample sizes,
|
|
|
- $R_1$ is the sum of ranks for the first sample.
|
|
|
|
|
|
#### Common Use Cases
|
|
|
- **Non-Normal Data**: When data deviates significantly from normality or other distribution assumptions.
|
|
|
- **Ordinal or Ranked Data**: Non-parametric methods are particularly useful for ordinal data or when the relationship between variables is not linear.
|
|
|
|
|
|
### Differences Between Parametric and Non-Parametric Methods
|
|
|
|
|
|
$$
|
|
|
\begin{array}{|l|c|c|}
|
|
|
\hline
|
|
|
\text{Aspect} & \text{Parametric Methods} & \text{Non-Parametric Methods} \\
|
|
|
\hline
|
|
|
\text{Assumptions} & Strong (e.g., normality) & Weak (no distribution assumptions) \\
|
|
|
\hline
|
|
|
\text{Flexibility} & Less flexible & More flexible \\
|
|
|
\hline
|
|
|
\text{Efficiency with Small Data} & High & May require more data \\
|
|
|
\hline
|
|
|
\text{Complexity} & Fixed complexity & Grows with data \\
|
|
|
\hline
|
|
|
\end{array}
|
|
|
$$
|
|
|
|
|
|
### Common Issues with Parametric Methods
|
|
|
- **Assumption Violations**: When the underlying assumptions (e.g., normality, homoscedasticity) are violated, parametric methods can produce biased results. Non-parametric alternatives should be considered in such cases.
|
|
|
|
|
|
- **Overfitting in Complex Models**: In certain cases, using overly complex parametric models (e.g., including too many predictors) can lead to overfitting, especially with small datasets.
|
|
|
|
|
|
### Common Issues with Non-Parametric Methods
|
|
|
- **Inefficiency with Small Data**: Non-parametric methods typically require larger datasets to achieve the same level of efficiency as parametric models due to their flexibility.
|
|
|
|
|
|
- **Loss of Power**: Non-parametric tests often have less statistical power compared to parametric tests, meaning they may need larger sample sizes to detect a true effect.
|
|
|
|
|
|
### Best Practices
|
|
|
|
|
|
- **Choose Parametric Methods When Assumptions Hold**: If the data meets the assumptions of a parametric model (e.g., normality, linearity), parametric methods provide more efficient and interpretable results.
|
|
|
|
|
|
- **Use Non-Parametric Methods for Flexibility**: When the data does not meet parametric assumptions or when working with small or unusual datasets, non-parametric methods are a safer choice.
|
|
|
|
|
|
- **Verify Assumptions**: Always check if the assumptions of a parametric method hold by conducting diagnostic tests (e.g., normality tests, residual analysis). If assumptions are violated, consider switching to non-parametric methods.
|
|
|
|