VIF & Multicollinearity
Detect and Resolve Variable Redundancy
What is Multicollinearity?
Multicollinearity occurs when independent variables in your model are highly correlated with each other, making it difficult to isolate individual effects.
Problem: Can't tell which variable is truly driving results
Example: TV_Spend and TV_GRPs are nearly perfectly correlated—including both causes multicollinearity
Why Multicollinearity Matters
The Impact
Statistical issues:
- Inflated standard errors 
- Unstable coefficients 
- Non-significant variables that should be significant 
- Coefficients change dramatically when variables added/removed 
Business issues:
- Can't attribute effects correctly 
- Optimization recommendations unreliable 
- ROI calculations questionable 
- Stakeholder confusion 
What Causes It
Common sources:
- Multiple measures of same thing (spend and impressions) 
- Transformed versions (raw and logged) 
- Highly correlated channels (Facebook and Instagram often move together) 
- Time trends (multiple variables growing over time) 
VIF (Variance Inflation Factor)
What is VIF?
VIF measures how much the variance of a coefficient estimate is inflated due to multicollinearity.
Formula concept: VIF = 1 / (1 - R²) Where R² is from regressing that variable on all other variables
Interpretation:
- VIF = 1: No multicollinearity 
- VIF = 5: Variance inflated 5x 
- VIF = 10: Variance inflated 10x 
VIF Thresholds
< 5: Low multicollinearity ✅ No action needed ✅ Variables independent enough
5-10: Moderate multicollinearity ⚠️ Monitor situation ⚠️ Consider if both variables needed ⚠️ Acceptable if theoretically important
> 10: Severe multicollinearity 🚨 Action required 🚨 Remove one of correlated variables 🚨 Combine into weighted variable 🚨 Model unstable without resolution
Checking VIF in MixModeler
Location
Variable Testing Page includes VIF in results table
Model Diagnostics shows VIF for all model variables
Interpretation
Results table shows:
- Variable name 
- VIF value 
- Interpretation (Low/Moderate/Severe) 
Example:
TV_Spend
2.3
Low
Digital_Spend
3.1
Low
Radio_Spend
8.5
Moderate
Search_Spend
15.2
Severe
Seasonality
1.8
Low
Interpretation: Search_Spend has severe multicollinearity—investigate correlation with other variables
Detecting Multicollinearity
Pre-Testing Detection
In Variable Testing:
- Select variables to test 
- Click "Test Variables" 
- Review VIF column in results 
- High VIF indicates redundancy with existing model variables 
Decision: Don't add variables with VIF > 10
Post-Addition Detection
After adding variable:
- Navigate to Model Diagnostics 
- Check VIF test results 
- Identify problematic variables 
Action: Remove or address high-VIF variables
Visual Detection
Correlation patterns:
- Two variables move in lockstep 
- Very similar time-series patterns 
- Scatter plot shows linear relationship 
Warning signs in model:
- Coefficients flip signs when variables added/removed 
- Large changes in coefficients 
- Previously significant variables become non-significant 
Resolving Multicollinearity
Solution 1: Remove One Variable
When to use: Two variables measure essentially the same thing
Process:
- Identify correlated pair (both have high VIF) 
- Compare T-statistics and R² contribution 
- Keep variable with higher T-stat 
- Remove the other 
Example:
- TV_Spend (VIF=12) and TV_GRPs (VIF=14) 
- TV_Spend has higher T-stat 
- Remove TV_GRPs 
- VIF drops to acceptable levels 
Solution 2: Create Weighted Variable
When to use: Both variables provide value but are correlated
Process:
- Test both variables separately 
- Note coefficients for each 
- Create weighted combination 
- Use combined variable in model 
Example:
- Facebook_Spend (coef=400, VIF=11) 
- Instagram_Spend (coef=350, VIF=12) 
- Create: Social_Spend_WGTD = 0.53×Facebook + 0.47×Instagram 
- Use combined variable (VIF=3.5) 
Solution 3: Use Transformations
When to use: Variables correlated due to trends
Options:
- Log transformation 
- Difference transformation (change from previous period) 
- Detrending 
Caution: Changes interpretation
Solution 4: Accept and Document
When to use:
- VIF between 5-10 
- Both variables theoretically important 
- Coefficients stable and significant 
Requirements:
- Document in model notes 
- Explain to stakeholders 
- Monitor in future iterations 
- Don't use for optimization (unreliable) 
Common Multicollinearity Scenarios
Scenario 1: Spend and Impressions
Problem: TV_Spend and TV_Impressions highly correlated
Why: More spend = more impressions
VIF: Both > 12
Solution:
- Choose spend (more actionable) 
- Or combine using weighted variable 
- Don't include both 
Scenario 2: Related Channels
Problem: Facebook_Spend and Instagram_Spend move together (managed by same team)
VIF: Both = 9-11
Solutions:
- Combine into Social_Spend_WGTD 
- Or keep separate if VIF < 10 
- Test separately to isolate effects 
Scenario 3: Original and Transformed
Problem: TV_Spend and TV_Spend_adstock_70 both included
VIF: Perfect multicollinearity
Solution:
- NEVER include both 
- Choose one: raw OR adstocked 
- Typically use adstocked version 
Scenario 4: Seasonal Variables
Problem: 11 month dummies all correlated with each other
VIF: Moderate for all
Solution:
- This is expected and acceptable 
- Month dummies MUST be used together 
- VIF for group < 10 is acceptable 
Scenario 5: Time Trends
Problem: Multiple variables growing over time
VIF: All high
Solutions:
- Add time trend variable 
- Detrend variables 
- Use differencing (period-over-period change) 
Testing for Multicollinearity
Pre-Addition Testing
Before adding variable:
- Pre-test in Variable Testing 
- Check VIF in results 
- If VIF > 10, investigate 
Prevents: Adding problematic variables
Post-Model Testing
After building model:
- Navigate to Model Diagnostics 
- Run multicollinearity test 
- Review VIF for all variables 
- Address any VIF > 10 
Ensures: Model stability
Iterative Testing
As model evolves:
- Check VIF after each variable addition 
- Monitor for VIF increases 
- Address immediately 
- Maintain clean model 
Interpreting VIF Results
Individual VIF
Focus on highest values first:
- Sort by VIF descending 
- Address VIF > 10 immediately 
- Monitor VIF 5-10 
- VIF < 5 is fine 
Pattern Recognition
All variables high VIF:
- Suggests systemic issue 
- Likely time trends 
- Consider detrending approach 
Two variables high VIF:
- Direct correlation between them 
- Remove one or combine 
- Most common scenario 
Increasing VIF over iterations:
- New variables correlated with existing 
- Need to be more selective 
- Consider variable combinations 
Best Practices
During Model Building
Check VIF regularly:
- After each variable addition 
- Before finalizing model 
- When coefficients seem unstable 
Be proactive:
- Pre-test variables for VIF 
- Don't add high-VIF variables 
- Address issues immediately 
Document decisions:
- Which variables removed 
- Why chosen over alternatives 
- VIF values before and after 
Variable Selection
Avoid including:
- Multiple measures of same thing 
- Both raw and transformed 
- Highly correlated channels without combining 
Prefer:
- Independent predictors 
- Orthogonal variables 
- Combined weighted variables when needed 
Stakeholder Communication
Explain multicollinearity:
- Use simple terms 
- Explain why both can't be included 
- Show instability without resolution 
Justify decisions:
- Why variable A kept over B 
- Statistical rationale (VIF, T-stat) 
- Business rationale (actionability) 
Advanced Topics
Tolerance
Tolerance = 1 / VIF
Interpretation:
- Tolerance close to 1: Low multicollinearity 
- Tolerance close to 0: High multicollinearity 
- Same information as VIF, different scale 
Use: Some prefer tolerance, same diagnostic
Condition Number
Alternative measure:
- Ratio of largest to smallest eigenvalue 
- Condition number > 30 indicates multicollinearity 
- More technical, less commonly used 
Ridge Regression
Advanced solution:
- Adds penalty for correlated variables 
- Can keep correlated variables 
- Requires specialized techniques 
- Not currently in MixModeler 
Troubleshooting
VIF calculation fails
Cause: Not enough observations or perfect multicollinearity
Solution:
- Check for duplicate variables 
- Ensure sufficient data 
- Remove obviously redundant variables 
All VIF values are high
Cause: Time trends across all variables
Solution:
- Add time trend variable 
- Consider first-differencing 
- Detrend variables 
VIF acceptable but model unstable
Cause: VIF isn't only diagnostic
Solution:
- Check other diagnostics 
- Review coefficient signs 
- Validate with business logic 
Key Takeaways
- VIF measures variance inflation due to multicollinearity 
- VIF > 10 requires action, < 5 is acceptable 
- Check VIF before adding variables (Variable Testing) 
- Check VIF after building model (Diagnostics) 
- Remove one variable, combine into weighted, or accept and document 
- Common issues: spend vs impressions, related channels, trends 
- Monitor VIF throughout model development 
- Address multicollinearity for stable, reliable models 
Last updated