VIF & Multicollinearity
Detect and Resolve Variable Redundancy
What is Multicollinearity?
Multicollinearity occurs when independent variables in your model are highly correlated with each other, making it difficult to isolate individual effects.
Problem: Can't tell which variable is truly driving results
Example: TV_Spend and TV_GRPs are nearly perfectly correlated—including both causes multicollinearity
Why Multicollinearity Matters
The Impact
Statistical issues:
Inflated standard errors
Unstable coefficients
Non-significant variables that should be significant
Coefficients change dramatically when variables added/removed
Business issues:
Can't attribute effects correctly
Optimization recommendations unreliable
ROI calculations questionable
Stakeholder confusion
What Causes It
Common sources:
Multiple measures of same thing (spend and impressions)
Transformed versions (raw and logged)
Highly correlated channels (Facebook and Instagram often move together)
Time trends (multiple variables growing over time)
VIF (Variance Inflation Factor)
What is VIF?
VIF measures how much the variance of a coefficient estimate is inflated due to multicollinearity.
Formula concept: VIF = 1 / (1 - R²) Where R² is from regressing that variable on all other variables
Interpretation:
VIF = 1: No multicollinearity
VIF = 5: Variance inflated 5x
VIF = 10: Variance inflated 10x
VIF Thresholds
< 5: Low multicollinearity ✅ No action needed ✅ Variables independent enough
5-10: Moderate multicollinearity ⚠️ Monitor situation ⚠️ Consider if both variables needed ⚠️ Acceptable if theoretically important
> 10: Severe multicollinearity 🚨 Action required 🚨 Remove one of correlated variables 🚨 Combine into weighted variable 🚨 Model unstable without resolution
Checking VIF in MixModeler
Location
Variable Testing Page includes VIF in results table
Model Diagnostics shows VIF for all model variables
Interpretation
Results table shows:
Variable name
VIF value
Interpretation (Low/Moderate/Severe)
Example:
TV_Spend
2.3
Low
Digital_Spend
3.1
Low
Radio_Spend
8.5
Moderate
Search_Spend
15.2
Severe
Seasonality
1.8
Low
Interpretation: Search_Spend has severe multicollinearity—investigate correlation with other variables
Detecting Multicollinearity
Pre-Testing Detection
In Variable Testing:
Select variables to test
Click "Test Variables"
Review VIF column in results
High VIF indicates redundancy with existing model variables
Decision: Don't add variables with VIF > 10
Post-Addition Detection
After adding variable:
Navigate to Model Diagnostics
Check VIF test results
Identify problematic variables
Action: Remove or address high-VIF variables
Visual Detection
Correlation patterns:
Two variables move in lockstep
Very similar time-series patterns
Scatter plot shows linear relationship
Warning signs in model:
Coefficients flip signs when variables added/removed
Large changes in coefficients
Previously significant variables become non-significant
Resolving Multicollinearity
Solution 1: Remove One Variable
When to use: Two variables measure essentially the same thing
Process:
Identify correlated pair (both have high VIF)
Compare T-statistics and R² contribution
Keep variable with higher T-stat
Remove the other
Example:
TV_Spend (VIF=12) and TV_GRPs (VIF=14)
TV_Spend has higher T-stat
Remove TV_GRPs
VIF drops to acceptable levels
Solution 2: Create Weighted Variable
When to use: Both variables provide value but are correlated
Process:
Test both variables separately
Note coefficients for each
Create weighted combination
Use combined variable in model
Example:
Facebook_Spend (coef=400, VIF=11)
Instagram_Spend (coef=350, VIF=12)
Create: Social_Spend_WGTD = 0.53×Facebook + 0.47×Instagram
Use combined variable (VIF=3.5)
Solution 3: Use Transformations
When to use: Variables correlated due to trends
Options:
Log transformation
Difference transformation (change from previous period)
Detrending
Caution: Changes interpretation
Solution 4: Accept and Document
When to use:
VIF between 5-10
Both variables theoretically important
Coefficients stable and significant
Requirements:
Document in model notes
Explain to stakeholders
Monitor in future iterations
Don't use for optimization (unreliable)
Common Multicollinearity Scenarios
Scenario 1: Spend and Impressions
Problem: TV_Spend and TV_Impressions highly correlated
Why: More spend = more impressions
VIF: Both > 12
Solution:
Choose spend (more actionable)
Or combine using weighted variable
Don't include both
Scenario 2: Related Channels
Problem: Facebook_Spend and Instagram_Spend move together (managed by same team)
VIF: Both = 9-11
Solutions:
Combine into Social_Spend_WGTD
Or keep separate if VIF < 10
Test separately to isolate effects
Scenario 3: Original and Transformed
Problem: TV_Spend and TV_Spend_adstock_70 both included
VIF: Perfect multicollinearity
Solution:
NEVER include both
Choose one: raw OR adstocked
Typically use adstocked version
Scenario 4: Seasonal Variables
Problem: 11 month dummies all correlated with each other
VIF: Moderate for all
Solution:
This is expected and acceptable
Month dummies MUST be used together
VIF for group < 10 is acceptable
Scenario 5: Time Trends
Problem: Multiple variables growing over time
VIF: All high
Solutions:
Add time trend variable
Detrend variables
Use differencing (period-over-period change)
Testing for Multicollinearity
Pre-Addition Testing
Before adding variable:
Pre-test in Variable Testing
Check VIF in results
If VIF > 10, investigate
Prevents: Adding problematic variables
Post-Model Testing
After building model:
Navigate to Model Diagnostics
Run multicollinearity test
Review VIF for all variables
Address any VIF > 10
Ensures: Model stability
Iterative Testing
As model evolves:
Check VIF after each variable addition
Monitor for VIF increases
Address immediately
Maintain clean model
Interpreting VIF Results
Individual VIF
Focus on highest values first:
Sort by VIF descending
Address VIF > 10 immediately
Monitor VIF 5-10
VIF < 5 is fine
Pattern Recognition
All variables high VIF:
Suggests systemic issue
Likely time trends
Consider detrending approach
Two variables high VIF:
Direct correlation between them
Remove one or combine
Most common scenario
Increasing VIF over iterations:
New variables correlated with existing
Need to be more selective
Consider variable combinations
Best Practices
During Model Building
Check VIF regularly:
After each variable addition
Before finalizing model
When coefficients seem unstable
Be proactive:
Pre-test variables for VIF
Don't add high-VIF variables
Address issues immediately
Document decisions:
Which variables removed
Why chosen over alternatives
VIF values before and after
Variable Selection
Avoid including:
Multiple measures of same thing
Both raw and transformed
Highly correlated channels without combining
Prefer:
Independent predictors
Orthogonal variables
Combined weighted variables when needed
Stakeholder Communication
Explain multicollinearity:
Use simple terms
Explain why both can't be included
Show instability without resolution
Justify decisions:
Why variable A kept over B
Statistical rationale (VIF, T-stat)
Business rationale (actionability)
Advanced Topics
Tolerance
Tolerance = 1 / VIF
Interpretation:
Tolerance close to 1: Low multicollinearity
Tolerance close to 0: High multicollinearity
Same information as VIF, different scale
Use: Some prefer tolerance, same diagnostic
Condition Number
Alternative measure:
Ratio of largest to smallest eigenvalue
Condition number > 30 indicates multicollinearity
More technical, less commonly used
Ridge Regression
Advanced solution:
Adds penalty for correlated variables
Can keep correlated variables
Requires specialized techniques
Not currently in MixModeler
Troubleshooting
VIF calculation fails
Cause: Not enough observations or perfect multicollinearity
Solution:
Check for duplicate variables
Ensure sufficient data
Remove obviously redundant variables
All VIF values are high
Cause: Time trends across all variables
Solution:
Add time trend variable
Consider first-differencing
Detrend variables
VIF acceptable but model unstable
Cause: VIF isn't only diagnostic
Solution:
Check other diagnostics
Review coefficient signs
Validate with business logic
Key Takeaways
VIF measures variance inflation due to multicollinearity
VIF > 10 requires action, < 5 is acceptable
Check VIF before adding variables (Variable Testing)
Check VIF after building model (Diagnostics)
Remove one variable, combine into weighted, or accept and document
Common issues: spend vs impressions, related channels, trends
Monitor VIF throughout model development
Address multicollinearity for stable, reliable models
Last updated