Heteroscedasticity

What Heteroscedasticity Tests Check

Heteroscedasticity tests examine whether the variance of model residuals is constant across all levels of the independent variables (homoscedasticity) or varies systematically (heteroscedasticity).

Purpose: Tests if error variance is constant across fitted values, ensuring prediction intervals and hypothesis tests are accurate.

Why Constant Variance Matters

When residuals have constant variance (homoscedasticity):

Standard Errors are Correct: Coefficient standard errors accurately reflect uncertainty

Confidence Intervals are Valid: Intervals have correct coverage probabilities

Hypothesis Tests are Reliable: P-values and significance tests are trustworthy

Predictions are Optimal: Ordinary Least Squares (OLS) provides the most efficient estimates

Heteroscedasticity doesn't bias coefficient estimates, but it makes standard errors incorrect, leading to invalid confidence intervals and hypothesis tests.

Statistical Tests Available

MixModeler provides three heteroscedasticity tests:

Test
Description
Interpretation

Breusch-Pagan Test

Tests for linear relationship between squared residuals and predictors

p < 0.05 indicates heteroscedasticity

White Test

More general test for any form of heteroscedasticity

p < 0.05 indicates heteroscedasticity

Goldfeld-Quandt Test

Tests whether variance differs across subsamples

p < 0.05 indicates heteroscedasticity

Interpretation: If p-value ≥ 0.05, no significant heteroscedasticity detected. If p < 0.05, heteroscedasticity is present.

Visual Diagnostics

MixModeler provides key visualizations for detecting heteroscedasticity:

Residuals vs Fitted Values Plot:

  • Plots residuals against predicted values

  • Good: Random scatter around zero with consistent spread (no pattern)

  • Problem: Funnel shape (variance increases/decreases with fitted values)

  • Problem: Curved pattern (indicates non-linear relationship)

Scale-Location Plot:

  • Plots square root of absolute standardized residuals against fitted values

  • Good: Horizontal line with random scatter

  • Problem: Upward or downward trend indicates changing variance

Common Patterns

Fan Shape (Most Common):

  • Variance increases as predicted values increase

  • Often seen when predicting sales (larger values have more variability)

Funnel Shape:

  • Variance decreases as predicted values increase

  • Less common but still problematic

Grouped Heteroscedasticity:

  • Different variance for different categories or time periods

  • Suggests need for categorical variables or interactions

Interpreting Test Results

Passed Tests (✓)

What it means: No significant heteroscedasticity detected (p ≥ 0.05)

Implications:

  • Variance of errors is reasonably constant

  • Standard errors are reliable

  • Confidence intervals and p-values are valid

Action: No action needed - homoscedasticity assumption is satisfied

Failed Tests (⚠)

What it means: Heteroscedasticity detected (p < 0.05)

Implications:

  • Standard errors may be incorrect (typically too small)

  • Confidence intervals may have wrong coverage

  • Hypothesis tests may be unreliable

  • Note: Coefficient estimates remain unbiased

Common Causes:

  • Dependent variable has increasing/decreasing variance

  • Important variables omitted

  • Wrong functional form (need transformations)

  • Outliers affecting variance

  • Natural heteroscedasticity in the data-generating process

What to Do When Tests Fail

If heteroscedasticity tests fail, try these solutions in order:

1. Transform the Dependent Variable (Most Effective)

  • Log transformation: Reduces right skewness and stabilizes variance

    • Good for: Sales, revenue, spend data

    • Effect: Multiplicative relationships become additive

  • Square root transformation: Moderate variance stabilization

  • Inverse transformation: For highly skewed data

2. Use Weighted Least Squares (WLS)

  • Give less weight to observations with higher variance

  • Requires estimating variance function

  • Available in advanced statistical software

3. Use Robust Standard Errors

  • Calculate heteroscedasticity-consistent (HC) standard errors

  • Corrects standard errors without changing coefficients

  • Common variants: HC0, HC1, HC2, HC3

4. Add Omitted Variables

  • Include variables that explain variance patterns

  • Add interaction terms

  • Consider non-linear transformations

5. Check for Outliers

  • Review Influential Points diagnostic

  • Outliers can create false heteroscedasticity patterns

  • Consider robust regression methods

6. When Heteroscedasticity is Acceptable

  • Mild heteroscedasticity with large sample sizes

  • Focus on coefficient estimates rather than inference

  • Using robust standard errors in practice

  • Business decisions not sensitive to exact confidence intervals

Practical Guidelines

Acceptable Scenarios:

  • Mild heteroscedasticity (p-value between 0.01-0.05)

  • Large datasets where robust standard errors are used

  • Predictions are the primary goal (coefficients remain unbiased)

  • Natural variation in business data

Critical Issues:

  • Severe heteroscedasticity (p < 0.001)

  • Clear fan or funnel pattern in residual plots

  • Small sample sizes requiring precise inference

  • Using model for confidence interval construction

Example Interpretation

Scenario 1 - Passed:

  • Breusch-Pagan p-value: 0.28

  • White test p-value: 0.42

  • Residual plot shows random scatter with consistent spread

Interpretation: No significant heteroscedasticity detected. The constant variance assumption is satisfied, and standard errors are reliable.

Scenario 2 - Failed:

  • Breusch-Pagan p-value: 0.008

  • White test p-value: 0.003

  • Residual plot shows clear fan shape (increasing variance)

Interpretation: Heteroscedasticity detected. Variance increases with fitted values. Consider log-transforming the KPI or using robust standard errors. If focused on coefficient interpretation rather than p-values, this may be acceptable with appropriate caveats.

Scenario 3 - Severe:

  • Breusch-Pagan p-value: < 0.001

  • White test p-value: < 0.001

  • Strong funnel pattern with extreme variance differences

Interpretation: Severe heteroscedasticity. Model requires transformation. Try log-transforming the dependent variable before proceeding with inference or decision-making.

Marketing Mix Modeling Context

In MMM, heteroscedasticity often appears because:

Sales Variability: Larger sales periods naturally have higher variance

Promotional Effects: Promotions create volatility in certain periods

Seasonal Patterns: Different variance across seasons

Spend Ranges: Different variance at low vs. high spend levels

Log transformation of the KPI is particularly effective in MMM as it:

  • Stabilizes variance

  • Makes relationships multiplicative (natural for marketing effects)

  • Reduces influence of outliers

  • Provides coefficients interpretable as elasticities

Relationship to Other Assumptions

Heteroscedasticity often co-occurs with:

Non-normality: Changing variance can cause residuals to deviate from normality

Outliers: Extreme values can create both heteroscedasticity and influential points

Non-linearity: Wrong functional form can manifest as heteroscedasticity

After reviewing heteroscedasticity:

  • Check Residual Normality as heteroscedasticity can affect normality

  • Review Influential Points to identify outliers causing variance issues

  • Examine Actual vs Predicted for systematic patterns

Last updated