Multicollinearity (VIF)
What Multicollinearity Tests Check
Multicollinearity tests detect when predictor variables in your model are highly correlated with each other. While multicollinearity doesn't bias coefficient estimates, it inflates standard errors and makes individual coefficients unstable and difficult to interpret.
Purpose: Checks if predictor variables are highly correlated, ensuring coefficient estimates are stable and interpretable.
Why Low Multicollinearity Matters
When predictor variables are not highly correlated:
Coefficients are Stable: Small changes in data don't cause large changes in coefficients
Interpretation is Clear: Each variable's effect can be isolated and understood
Standard Errors are Smaller: More precise estimates and narrower confidence intervals
Statistical Power is Higher: Easier to detect truly significant relationships
High multicollinearity makes it difficult to determine which variables are actually driving the outcome and can lead to counterintuitive or unstable coefficient estimates.
Variance Inflation Factor (VIF)
VIF is the primary metric for measuring multicollinearity:
Formula: VIF = 1 / (1 - R²ᵢ), where R²ᵢ is the R-squared from regressing variable i against all other predictors
Interpretation: VIF measures how much the variance of a coefficient is inflated due to correlation with other predictors
VIF Interpretation Guidelines
1 - 5
None to Low
No multicollinearity issues
✓ No action needed
5 - 10
Moderate
Some multicollinearity present
Monitor and investigate
> 10
Severe
Serious multicollinearity problem
Take corrective action
Tolerance: The inverse of VIF (1/VIF). Values < 0.1 indicate severe multicollinearity.
Conservative threshold: Some analysts use VIF > 5 as the cutoff for concern
Liberal threshold: VIF > 10 is widely accepted as indicating severe problems
Common Sources in MMM
Marketing Mix Models are particularly prone to multicollinearity:
Correlated Media Channels:
- TV and Display often run in parallel 
- Digital channels coordinated in campaigns 
- Seasonal promotional calendars align 
Time Trends:
- Multiple variables growing over time 
- Seasonal patterns across channels 
- Economic indicators correlated with marketing 
Created Variables:
- Adstocked variables correlated with raw spend 
- Lagged variables correlated with current values 
- Interaction terms correlated with main effects 
Interpreting Test Results
Passed Tests (✓)
What it means: All VIF < 5 (or < 10 depending on threshold)
Implications:
- Variables are sufficiently independent 
- Each coefficient represents a unique contribution 
- Standard errors are not inflated 
- Model is stable and interpretable 
Action: No action needed - multicollinearity is not a concern
Failed Tests (⚠)
What it means: One or more variables have high VIF (> 10)
Implications:
- Affected coefficients have large standard errors 
- Small data changes cause large coefficient changes 
- Individual variable effects cannot be isolated 
- Confidence intervals are very wide 
Symptoms:
- Coefficients with unexpected signs (negative when should be positive) 
- Large coefficient swings when adding/removing variables 
- High R² but few significant coefficients 
- Coefficients change dramatically with small data changes 
What to Do When Tests Fail
If multicollinearity is detected, try these solutions:
1. Remove Highly Correlated Variables (Most Direct)
- Identify variables with VIF > 10 
- Remove one from each pair of highly correlated variables 
- Keep the variable more theoretically important or easier to measure 
- Use business logic to decide which to retain 
2. Combine Correlated Variables
- Create composite indices or aggregate variables 
- Example: Combine all digital channels into "Digital_Total" 
- Example: Combine search and social into "Performance_Media" 
- Sum or average correlated channels 
3. Use Principal Component Analysis (PCA)
- Create uncorrelated components from correlated variables 
- First few components explain most variance 
- Lose interpretability but gain stability 
4. Collect More Data
- Longer time series can help distinguish effects 
- More variation in independent variables reduces correlation 
- Additional observations improve precision 
5. Accept Multicollinearity
- If focus is on overall model fit rather than individual coefficients 
- When prediction is the goal (coefficients remain unbiased) 
- For variables you must keep for business reasons 
- When VIF is moderate (5-10) rather than severe (>10) 
6. Use Regularization Techniques
- Ridge regression penalizes large coefficients 
- Lasso regression performs variable selection 
- Elastic net combines ridge and lasso 
- Bayesian methods with informative priors 
Practical Guidelines
When to Act:
- VIF > 10 (severe multicollinearity) 
- Coefficients have wrong signs 
- Model is unstable across specifications 
- Need to interpret individual coefficients 
When Multicollinearity May Be Acceptable:
- VIF between 5-10 (moderate multicollinearity) 
- Primary goal is prediction, not interpretation 
- All variables are theoretically necessary 
- Coefficients have expected signs and reasonable magnitudes 
- Using model for overall attribution, not optimization 
Marketing Mix Modeling Considerations:
- Total marketing contribution may still be accurate 
- Overall model fit (R²) not affected 
- Channel-level ROI calculations become unreliable 
- Budget optimization requires stable coefficients 
Visual Diagnostics
MixModeler provides a correlation matrix showing pairwise correlations between predictors:
Color Coding:
- Dark blue: Strong positive correlation (>0.8) 
- Light blue: Moderate positive correlation (0.5-0.8) 
- White: Low correlation (-0.5 to 0.5) 
- Red: Negative correlation 
Warning Signs:
- Correlations > |0.8| between predictors 
- Clusters of highly correlated variables 
- Correlation > 0.9 indicates near-perfect collinearity 
Example Interpretation
Scenario 1 - Passed:
- All VIF < 3 
- Highest correlation: 0.62 (TV and Radio) 
- All coefficients have expected signs 
Interpretation: No multicollinearity issues. Each variable's effect can be reliably isolated and interpreted independently.
Scenario 2 - Moderate Multicollinearity:
- TV VIF: 6.2 
- Display VIF: 6.8 
- Correlation between TV and Display: 0.85 
Interpretation: Moderate multicollinearity between TV and Display. Coefficients may be less stable. Consider combining into "Brand_Media" or removing one channel. If both are needed for business reasons, acknowledge limitation when interpreting individual effects.
Scenario 3 - Severe Multicollinearity:
- Search VIF: 15.3 
- Social VIF: 14.7 
- Correlation: 0.92 
- Search coefficient is negative (unexpected) 
Interpretation: Severe multicollinearity. Search and Social run in tandem, making it impossible to separate their effects. Negative coefficient is likely spurious. Combine into "Digital_Performance" or remove one variable before using model.
Common Mistakes to Avoid
Removing all variables with high VIF:
- Only remove one from each correlated pair 
- Recalculate VIF after each removal 
- VIF values change when variables are removed 
Ignoring business context:
- Don't blindly remove variables based on VIF alone 
- Consider theoretical importance 
- Maintain necessary control variables 
Confusing correlation with causation:
- High VIF doesn't mean variables cause each other 
- It just means they move together in your data 
Marketing Mix Modeling Context
Multicollinearity is especially common in MMM because:
Campaign Coordination: Marketing channels often activate together during campaigns
Budget Constraints: When one channel increases, others may increase proportionally
Seasonality: Most channels peak during the same seasons (holidays, summer)
Media Planning: Strategic alignment of channels (TV drives digital search)
Strategies to Reduce Multicollinearity in MMM:
- Use longer time series with more varied spend patterns 
- Include periods with different channel mixes 
- Test channels independently when possible 
- Use aggregated channel groups for strategic decisions 
- Accept moderate multicollinearity for tactical insights 
VIF Table Format
MixModeler displays VIF results in an organized table:
TV_Spend
2.3
0.43
Low
Digital_Spend
8.7
0.11
Moderate
Search_Spend
12.4
0.08
Severe
Summary Statistics:
- Low VIF Variables: Count with VIF < 5 
- Moderate VIF Variables: Count with 5 ≤ VIF ≤ 10 
- High VIF Variables: Count with VIF > 10 
Related Diagnostics
After reviewing multicollinearity:
- Check coefficients in Model Builder to see if signs make sense 
- Review correlation matrix to identify specific correlated pairs 
- Examine Variable Testing to ensure variables are truly significant 
- Use Model Comparison to test alternative specifications 
Last updated