Multicollinearity (VIF)
What Multicollinearity Tests Check
Multicollinearity tests detect when predictor variables in your model are highly correlated with each other. While multicollinearity doesn't bias coefficient estimates, it inflates standard errors and makes individual coefficients unstable and difficult to interpret.
Purpose: Checks if predictor variables are highly correlated, ensuring coefficient estimates are stable and interpretable.
Why Low Multicollinearity Matters
When predictor variables are not highly correlated:
Coefficients are Stable: Small changes in data don't cause large changes in coefficients
Interpretation is Clear: Each variable's effect can be isolated and understood
Standard Errors are Smaller: More precise estimates and narrower confidence intervals
Statistical Power is Higher: Easier to detect truly significant relationships
High multicollinearity makes it difficult to determine which variables are actually driving the outcome and can lead to counterintuitive or unstable coefficient estimates.
Variance Inflation Factor (VIF)
VIF is the primary metric for measuring multicollinearity:
Formula: VIF = 1 / (1 - R²ᵢ), where R²ᵢ is the R-squared from regressing variable i against all other predictors
Interpretation: VIF measures how much the variance of a coefficient is inflated due to correlation with other predictors
VIF Interpretation Guidelines
1 - 5
None to Low
No multicollinearity issues
✓ No action needed
5 - 10
Moderate
Some multicollinearity present
Monitor and investigate
> 10
Severe
Serious multicollinearity problem
Take corrective action
Tolerance: The inverse of VIF (1/VIF). Values < 0.1 indicate severe multicollinearity.
Conservative threshold: Some analysts use VIF > 5 as the cutoff for concern
Liberal threshold: VIF > 10 is widely accepted as indicating severe problems
Common Sources in MMM
Marketing Mix Models are particularly prone to multicollinearity:
Correlated Media Channels:
TV and Display often run in parallel
Digital channels coordinated in campaigns
Seasonal promotional calendars align
Time Trends:
Multiple variables growing over time
Seasonal patterns across channels
Economic indicators correlated with marketing
Created Variables:
Adstocked variables correlated with raw spend
Lagged variables correlated with current values
Interaction terms correlated with main effects
Interpreting Test Results
Passed Tests (✓)
What it means: All VIF < 5 (or < 10 depending on threshold)
Implications:
Variables are sufficiently independent
Each coefficient represents a unique contribution
Standard errors are not inflated
Model is stable and interpretable
Action: No action needed - multicollinearity is not a concern
Failed Tests (⚠)
What it means: One or more variables have high VIF (> 10)
Implications:
Affected coefficients have large standard errors
Small data changes cause large coefficient changes
Individual variable effects cannot be isolated
Confidence intervals are very wide
Symptoms:
Coefficients with unexpected signs (negative when should be positive)
Large coefficient swings when adding/removing variables
High R² but few significant coefficients
Coefficients change dramatically with small data changes
What to Do When Tests Fail
If multicollinearity is detected, try these solutions:
1. Remove Highly Correlated Variables (Most Direct)
Identify variables with VIF > 10
Remove one from each pair of highly correlated variables
Keep the variable more theoretically important or easier to measure
Use business logic to decide which to retain
2. Combine Correlated Variables
Create composite indices or aggregate variables
Example: Combine all digital channels into "Digital_Total"
Example: Combine search and social into "Performance_Media"
Sum or average correlated channels
3. Use Principal Component Analysis (PCA)
Create uncorrelated components from correlated variables
First few components explain most variance
Lose interpretability but gain stability
4. Collect More Data
Longer time series can help distinguish effects
More variation in independent variables reduces correlation
Additional observations improve precision
5. Accept Multicollinearity
If focus is on overall model fit rather than individual coefficients
When prediction is the goal (coefficients remain unbiased)
For variables you must keep for business reasons
When VIF is moderate (5-10) rather than severe (>10)
6. Use Regularization Techniques
Ridge regression penalizes large coefficients
Lasso regression performs variable selection
Elastic net combines ridge and lasso
Bayesian methods with informative priors
Practical Guidelines
When to Act:
VIF > 10 (severe multicollinearity)
Coefficients have wrong signs
Model is unstable across specifications
Need to interpret individual coefficients
When Multicollinearity May Be Acceptable:
VIF between 5-10 (moderate multicollinearity)
Primary goal is prediction, not interpretation
All variables are theoretically necessary
Coefficients have expected signs and reasonable magnitudes
Using model for overall attribution, not optimization
Marketing Mix Modeling Considerations:
Total marketing contribution may still be accurate
Overall model fit (R²) not affected
Channel-level ROI calculations become unreliable
Budget optimization requires stable coefficients
Visual Diagnostics
MixModeler provides a correlation matrix showing pairwise correlations between predictors:
Color Coding:
Dark blue: Strong positive correlation (>0.8)
Light blue: Moderate positive correlation (0.5-0.8)
White: Low correlation (-0.5 to 0.5)
Red: Negative correlation
Warning Signs:
Correlations > |0.8| between predictors
Clusters of highly correlated variables
Correlation > 0.9 indicates near-perfect collinearity
Example Interpretation
Scenario 1 - Passed:
All VIF < 3
Highest correlation: 0.62 (TV and Radio)
All coefficients have expected signs
Interpretation: No multicollinearity issues. Each variable's effect can be reliably isolated and interpreted independently.
Scenario 2 - Moderate Multicollinearity:
TV VIF: 6.2
Display VIF: 6.8
Correlation between TV and Display: 0.85
Interpretation: Moderate multicollinearity between TV and Display. Coefficients may be less stable. Consider combining into "Brand_Media" or removing one channel. If both are needed for business reasons, acknowledge limitation when interpreting individual effects.
Scenario 3 - Severe Multicollinearity:
Search VIF: 15.3
Social VIF: 14.7
Correlation: 0.92
Search coefficient is negative (unexpected)
Interpretation: Severe multicollinearity. Search and Social run in tandem, making it impossible to separate their effects. Negative coefficient is likely spurious. Combine into "Digital_Performance" or remove one variable before using model.
Common Mistakes to Avoid
Removing all variables with high VIF:
Only remove one from each correlated pair
Recalculate VIF after each removal
VIF values change when variables are removed
Ignoring business context:
Don't blindly remove variables based on VIF alone
Consider theoretical importance
Maintain necessary control variables
Confusing correlation with causation:
High VIF doesn't mean variables cause each other
It just means they move together in your data
Marketing Mix Modeling Context
Multicollinearity is especially common in MMM because:
Campaign Coordination: Marketing channels often activate together during campaigns
Budget Constraints: When one channel increases, others may increase proportionally
Seasonality: Most channels peak during the same seasons (holidays, summer)
Media Planning: Strategic alignment of channels (TV drives digital search)
Strategies to Reduce Multicollinearity in MMM:
Use longer time series with more varied spend patterns
Include periods with different channel mixes
Test channels independently when possible
Use aggregated channel groups for strategic decisions
Accept moderate multicollinearity for tactical insights
VIF Table Format
MixModeler displays VIF results in an organized table:
TV_Spend
2.3
0.43
Low
Digital_Spend
8.7
0.11
Moderate
Search_Spend
12.4
0.08
Severe
Summary Statistics:
Low VIF Variables: Count with VIF < 5
Moderate VIF Variables: Count with 5 ≤ VIF ≤ 10
High VIF Variables: Count with VIF > 10
Related Diagnostics
After reviewing multicollinearity:
Check coefficients in Model Builder to see if signs make sense
Review correlation matrix to identify specific correlated pairs
Examine Variable Testing to ensure variables are truly significant
Use Model Comparison to test alternative specifications
Last updated