Residual Normality

What Normality Tests Check

Residual normality tests validate the assumption that model errors (residuals) are normally distributed. This assumption is fundamental to regression analysis and affects the validity of confidence intervals, p-values, and hypothesis tests.

Purpose: Validates assumption that errors are normally distributed, which ensures the reliability of statistical inference.

Why Normality Matters

When residuals are normally distributed:

Confidence Intervals are Accurate: The confidence intervals around coefficient estimates are correctly calibrated

P-values are Reliable: Hypothesis tests for variable significance have correct Type I error rates

Predictions are Valid: Prediction intervals accurately reflect uncertainty

Mild violations of normality are often acceptable in Marketing Mix Modeling, especially with larger sample sizes (n > 30), due to the Central Limit Theorem.

Statistical Tests Available

MixModeler provides three complementary normality tests:

Test

Description

Interpretation

Jarque-Bera Test

Tests for normality using skewness and kurtosis

p < 0.05 indicates non-normality

Shapiro-Wilk Test

More powerful normality test for smaller samples

p < 0.05 indicates non-normality

D'Agostino-Pearson Test

Tests based on skewness and kurtosis

p < 0.05 indicates non-normality

Interpretation: If p-value ≥ 0.05, residuals appear normally distributed. If p < 0.05, there is evidence of non-normality.

Visual Diagnostics

MixModeler provides multiple visualizations to assess normality:

Q-Q Plot (Quantile-Quantile Plot):

Plots residual quantiles against theoretical normal distribution quantiles
Good: Points closely follow the diagonal reference line
Problem: Systematic deviations from the diagonal line

Histogram of Residuals:

Shows the distribution shape of residuals
Good: Bell-shaped, symmetric distribution
Problem: Heavily skewed or multi-modal distribution

Distribution Metrics:

Skewness: Should be close to 0 (symmetric distribution)
- Positive skewness: Long right tail
- Negative skewness: Long left tail
Kurtosis: Should be close to 3 (normal distribution)
- Higher than 3: Heavy tails (more outliers)
- Lower than 3: Light tails (fewer outliers)

Interpreting Test Results

Passed Tests (✓)

What it means: Residuals appear to be normally distributed (p ≥ 0.05)

Implications:

Hypothesis tests based on the assumption of normality are valid
Confidence intervals are appropriately calibrated
Model inference is statistically sound

Action: No action needed - proceed with using the model

Failed Tests (⚠)

What it means: Residuals may not be normally distributed (p < 0.05)

Implications:

Hypothesis tests may have incorrect Type I error rates
Confidence intervals might be miscalibrated
P-values should be interpreted more cautiously

Common Causes:

Outliers or influential observations
Skewed dependent variable
Missing important predictors
Non-linear relationships not captured
Model misspecification

What to Do When Tests Fail

If normality tests fail, consider these remedies in order:

1. Check for Outliers

Review the Influential Points diagnostic
Investigate data quality issues
Consider removing or down-weighting outliers if justified

2. Transform the Dependent Variable

Log transformation for right-skewed data
Square root transformation for moderate skewness
Box-Cox transformation for optimal normalization

3. Add Missing Variables

Include important predictors that might be missing
Add interaction terms or non-linear terms
Consider time trends or seasonal effects

4. Use Robust Methods

Accept minor violations if sample size is large (n > 50)
Use robust standard errors in your analysis
Consider bootstrapping for inference

5. When Violations are Acceptable

Large sample sizes (Central Limit Theorem applies)
Business decisions not sensitive to exact p-values
Focus is on prediction rather than inference
Violations are mild (p-value between 0.01-0.05)

Practical Guidelines

Acceptable Scenarios:

Slight non-normality with p-value between 0.01 and 0.05
Large datasets (n > 100) with mild skewness
Business focus on directional insights rather than precise intervals

Critical Issues:

Severe skewness or bimodal distributions
Heavy tails with many extreme outliers
P-values significantly below 0.01
Small sample sizes with clear non-normality

Example Interpretation

Scenario 1 - Passed:

Jarque-Bera p-value: 0.23
Shapiro-Wilk p-value: 0.18
Skewness: -0.12, Kurtosis: 3.05

Interpretation: Residuals are normally distributed. All normality tests pass. The model satisfies the normality assumption, and hypothesis tests are valid.

Scenario 2 - Failed:

Jarque-Bera p-value: 0.003
Shapiro-Wilk p-value: 0.012
Skewness: 1.85, Kurtosis: 8.2

Interpretation: Residuals show significant positive skewness and heavy tails. This indicates potential outliers or the need for variable transformation. Review influential points and consider log-transforming the KPI.

After reviewing normality:

Check Influential Points to identify outliers causing non-normality
Review Actual vs Predicted plot for systematic patterns
Examine Heteroscedasticity tests for variance issues

PreviousStatistical Tests NextAutocorrelation (Durbin-Watson)

Last updated 27 days ago

What Normality Tests Check

Why Normality Matters

Statistical Tests Available

Visual Diagnostics

Interpreting Test Results

Passed Tests (✓)

Failed Tests (⚠)

What to Do When Tests Fail

Practical Guidelines

Example Interpretation

Related Diagnostics