VIF & Multicollinearity

Detect and Resolve Variable Redundancy

What is Multicollinearity?

Multicollinearity occurs when independent variables in your model are highly correlated with each other, making it difficult to isolate individual effects.

Problem: Can't tell which variable is truly driving results

Example: TV_Spend and TV_GRPs are nearly perfectly correlated—including both causes multicollinearity

Why Multicollinearity Matters

The Impact

Statistical issues:

  • Inflated standard errors

  • Unstable coefficients

  • Non-significant variables that should be significant

  • Coefficients change dramatically when variables added/removed

Business issues:

  • Can't attribute effects correctly

  • Optimization recommendations unreliable

  • ROI calculations questionable

  • Stakeholder confusion

What Causes It

Common sources:

  • Multiple measures of same thing (spend and impressions)

  • Transformed versions (raw and logged)

  • Highly correlated channels (Facebook and Instagram often move together)

  • Time trends (multiple variables growing over time)

VIF (Variance Inflation Factor)

What is VIF?

VIF measures how much the variance of a coefficient estimate is inflated due to multicollinearity.

Formula concept: VIF = 1 / (1 - R²) Where R² is from regressing that variable on all other variables

Interpretation:

  • VIF = 1: No multicollinearity

  • VIF = 5: Variance inflated 5x

  • VIF = 10: Variance inflated 10x

VIF Thresholds

< 5: Low multicollinearity ✅ No action needed ✅ Variables independent enough

5-10: Moderate multicollinearity ⚠️ Monitor situation ⚠️ Consider if both variables needed ⚠️ Acceptable if theoretically important

> 10: Severe multicollinearity 🚨 Action required 🚨 Remove one of correlated variables 🚨 Combine into weighted variable 🚨 Model unstable without resolution

Checking VIF in MixModeler

Location

Variable Testing Page includes VIF in results table

Model Diagnostics shows VIF for all model variables

Interpretation

Results table shows:

  • Variable name

  • VIF value

  • Interpretation (Low/Moderate/Severe)

Example:

Variable
VIF
Status

TV_Spend

2.3

Low

Digital_Spend

3.1

Low

Radio_Spend

8.5

Moderate

Search_Spend

15.2

Severe

Seasonality

1.8

Low

Interpretation: Search_Spend has severe multicollinearity—investigate correlation with other variables

Detecting Multicollinearity

Pre-Testing Detection

In Variable Testing:

  1. Select variables to test

  2. Click "Test Variables"

  3. Review VIF column in results

  4. High VIF indicates redundancy with existing model variables

Decision: Don't add variables with VIF > 10

Post-Addition Detection

After adding variable:

  1. Navigate to Model Diagnostics

  2. Check VIF test results

  3. Identify problematic variables

Action: Remove or address high-VIF variables

Visual Detection

Correlation patterns:

  • Two variables move in lockstep

  • Very similar time-series patterns

  • Scatter plot shows linear relationship

Warning signs in model:

  • Coefficients flip signs when variables added/removed

  • Large changes in coefficients

  • Previously significant variables become non-significant

Resolving Multicollinearity

Solution 1: Remove One Variable

When to use: Two variables measure essentially the same thing

Process:

  1. Identify correlated pair (both have high VIF)

  2. Compare T-statistics and R² contribution

  3. Keep variable with higher T-stat

  4. Remove the other

Example:

  • TV_Spend (VIF=12) and TV_GRPs (VIF=14)

  • TV_Spend has higher T-stat

  • Remove TV_GRPs

  • VIF drops to acceptable levels

Solution 2: Create Weighted Variable

When to use: Both variables provide value but are correlated

Process:

  1. Test both variables separately

  2. Note coefficients for each

  3. Create weighted combination

  4. Use combined variable in model

Example:

  • Facebook_Spend (coef=400, VIF=11)

  • Instagram_Spend (coef=350, VIF=12)

  • Create: Social_Spend_WGTD = 0.53×Facebook + 0.47×Instagram

  • Use combined variable (VIF=3.5)

Solution 3: Use Transformations

When to use: Variables correlated due to trends

Options:

  • Log transformation

  • Difference transformation (change from previous period)

  • Detrending

Caution: Changes interpretation

Solution 4: Accept and Document

When to use:

  • VIF between 5-10

  • Both variables theoretically important

  • Coefficients stable and significant

Requirements:

  • Document in model notes

  • Explain to stakeholders

  • Monitor in future iterations

  • Don't use for optimization (unreliable)

Common Multicollinearity Scenarios

Scenario 1: Spend and Impressions

Problem: TV_Spend and TV_Impressions highly correlated

Why: More spend = more impressions

VIF: Both > 12

Solution:

  • Choose spend (more actionable)

  • Or combine using weighted variable

  • Don't include both

Problem: Facebook_Spend and Instagram_Spend move together (managed by same team)

VIF: Both = 9-11

Solutions:

  • Combine into Social_Spend_WGTD

  • Or keep separate if VIF < 10

  • Test separately to isolate effects

Scenario 3: Original and Transformed

Problem: TV_Spend and TV_Spend_adstock_70 both included

VIF: Perfect multicollinearity

Solution:

  • NEVER include both

  • Choose one: raw OR adstocked

  • Typically use adstocked version

Scenario 4: Seasonal Variables

Problem: 11 month dummies all correlated with each other

VIF: Moderate for all

Solution:

  • This is expected and acceptable

  • Month dummies MUST be used together

  • VIF for group < 10 is acceptable

Problem: Multiple variables growing over time

VIF: All high

Solutions:

  • Add time trend variable

  • Detrend variables

  • Use differencing (period-over-period change)

Testing for Multicollinearity

Pre-Addition Testing

Before adding variable:

  1. Pre-test in Variable Testing

  2. Check VIF in results

  3. If VIF > 10, investigate

Prevents: Adding problematic variables

Post-Model Testing

After building model:

  1. Navigate to Model Diagnostics

  2. Run multicollinearity test

  3. Review VIF for all variables

  4. Address any VIF > 10

Ensures: Model stability

Iterative Testing

As model evolves:

  1. Check VIF after each variable addition

  2. Monitor for VIF increases

  3. Address immediately

  4. Maintain clean model

Interpreting VIF Results

Individual VIF

Focus on highest values first:

  • Sort by VIF descending

  • Address VIF > 10 immediately

  • Monitor VIF 5-10

  • VIF < 5 is fine

Pattern Recognition

All variables high VIF:

  • Suggests systemic issue

  • Likely time trends

  • Consider detrending approach

Two variables high VIF:

  • Direct correlation between them

  • Remove one or combine

  • Most common scenario

Increasing VIF over iterations:

  • New variables correlated with existing

  • Need to be more selective

  • Consider variable combinations

Best Practices

During Model Building

Check VIF regularly:

  • After each variable addition

  • Before finalizing model

  • When coefficients seem unstable

Be proactive:

  • Pre-test variables for VIF

  • Don't add high-VIF variables

  • Address issues immediately

Document decisions:

  • Which variables removed

  • Why chosen over alternatives

  • VIF values before and after

Variable Selection

Avoid including:

  • Multiple measures of same thing

  • Both raw and transformed

  • Highly correlated channels without combining

Prefer:

  • Independent predictors

  • Orthogonal variables

  • Combined weighted variables when needed

Stakeholder Communication

Explain multicollinearity:

  • Use simple terms

  • Explain why both can't be included

  • Show instability without resolution

Justify decisions:

  • Why variable A kept over B

  • Statistical rationale (VIF, T-stat)

  • Business rationale (actionability)

Advanced Topics

Tolerance

Tolerance = 1 / VIF

Interpretation:

  • Tolerance close to 1: Low multicollinearity

  • Tolerance close to 0: High multicollinearity

  • Same information as VIF, different scale

Use: Some prefer tolerance, same diagnostic

Condition Number

Alternative measure:

  • Ratio of largest to smallest eigenvalue

  • Condition number > 30 indicates multicollinearity

  • More technical, less commonly used

Ridge Regression

Advanced solution:

  • Adds penalty for correlated variables

  • Can keep correlated variables

  • Requires specialized techniques

  • Not currently in MixModeler

Troubleshooting

VIF calculation fails

Cause: Not enough observations or perfect multicollinearity

Solution:

  • Check for duplicate variables

  • Ensure sufficient data

  • Remove obviously redundant variables

All VIF values are high

Cause: Time trends across all variables

Solution:

  • Add time trend variable

  • Consider first-differencing

  • Detrend variables

VIF acceptable but model unstable

Cause: VIF isn't only diagnostic

Solution:

  • Check other diagnostics

  • Review coefficient signs

  • Validate with business logic

Key Takeaways

  • VIF measures variance inflation due to multicollinearity

  • VIF > 10 requires action, < 5 is acceptable

  • Check VIF before adding variables (Variable Testing)

  • Check VIF after building model (Diagnostics)

  • Remove one variable, combine into weighted, or accept and document

  • Common issues: spend vs impressions, related channels, trends

  • Monitor VIF throughout model development

  • Address multicollinearity for stable, reliable models

Last updated