> For the complete documentation index, see [llms.txt](https://mixmodeler.gitbook.io/mixmodeler-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://mixmodeler.gitbook.io/mixmodeler-docs/model-building/variable-testing/vif-and-multicollinearity.md).

# VIF & Multicollinearity

### What is Multicollinearity?

Multicollinearity occurs when independent variables in your model are highly correlated with each other, making it difficult to isolate individual effects.

**Problem:** Can't tell which variable is truly driving results

**Example:** TV\_Spend and TV\_GRPs are nearly perfectly correlated—including both causes multicollinearity

### Why Multicollinearity Matters

#### The Impact

**Statistical issues:**

* Inflated standard errors
* Unstable coefficients
* Non-significant variables that should be significant
* Coefficients change dramatically when variables added/removed

**Business issues:**

* Can't attribute effects correctly
* Optimization recommendations unreliable
* ROI calculations questionable
* Stakeholder confusion

#### What Causes It

**Common sources:**

* Multiple measures of same thing (spend and impressions)
* Transformed versions (raw and logged)
* Highly correlated channels (Facebook and Instagram often move together)
* Time trends (multiple variables growing over time)

### VIF (Variance Inflation Factor)

#### What is VIF?

VIF measures how much the variance of a coefficient estimate is inflated due to multicollinearity.

**Formula concept:** VIF = 1 / (1 - R²)\
Where R² is from regressing that variable on all other variables

**Interpretation:**

* VIF = 1: No multicollinearity
* VIF = 5: Variance inflated 5x
* VIF = 10: Variance inflated 10x

#### VIF Thresholds

**< 5: Low multicollinearity**\
✅ No action needed\
✅ Variables independent enough

**5-10: Moderate multicollinearity**\
⚠️ Monitor situation\
⚠️ Consider if both variables needed\
⚠️ Acceptable if theoretically important

**> 10: Severe multicollinearity**\
🚨 Action required\
🚨 Remove one of correlated variables\
🚨 Combine into weighted variable\
🚨 Model unstable without resolution

### Checking VIF in MixModeler

#### Location

**Variable Testing Page** includes VIF in results table

**Model Diagnostics** shows VIF for all model variables

#### Interpretation

**Results table shows:**

* Variable name
* VIF value
* Interpretation (Low/Moderate/Severe)

**Example:**

| Variable       | VIF  | Status     |
| -------------- | ---- | ---------- |
| TV\_Spend      | 2.3  | Low        |
| Digital\_Spend | 3.1  | Low        |
| Radio\_Spend   | 8.5  | Moderate   |
| Search\_Spend  | 15.2 | **Severe** |
| Seasonality    | 1.8  | Low        |

**Interpretation:** Search\_Spend has severe multicollinearity—investigate correlation with other variables

### Detecting Multicollinearity

#### Pre-Testing Detection

**In Variable Testing:**

1. Select variables to test
2. Click "Test Variables"
3. Review VIF column in results
4. High VIF indicates redundancy with existing model variables

**Decision:** Don't add variables with VIF > 10

#### Post-Addition Detection

**After adding variable:**

1. Navigate to Model Diagnostics
2. Check VIF test results
3. Identify problematic variables

**Action:** Remove or address high-VIF variables

#### Visual Detection

**Correlation patterns:**

* Two variables move in lockstep
* Very similar time-series patterns
* Scatter plot shows linear relationship

**Warning signs in model:**

* Coefficients flip signs when variables added/removed
* Large changes in coefficients
* Previously significant variables become non-significant

### Resolving Multicollinearity

#### Solution 1: Remove One Variable

**When to use:** Two variables measure essentially the same thing

**Process:**

1. Identify correlated pair (both have high VIF)
2. Compare T-statistics and R² contribution
3. Keep variable with higher T-stat
4. Remove the other

**Example:**

* TV\_Spend (VIF=12) and TV\_GRPs (VIF=14)
* TV\_Spend has higher T-stat
* Remove TV\_GRPs
* VIF drops to acceptable levels

#### Solution 2: Create Weighted Variable

**When to use:** Both variables provide value but are correlated

**Process:**

1. Test both variables separately
2. Note coefficients for each
3. Create weighted combination
4. Use combined variable in model

**Example:**

* Facebook\_Spend (coef=400, VIF=11)
* Instagram\_Spend (coef=350, VIF=12)
* Create: Social\_Spend\_WGTD = 0.53×Facebook + 0.47×Instagram
* Use combined variable (VIF=3.5)

#### Solution 3: Use Transformations

**When to use:** Variables correlated due to trends

**Options:**

* Log transformation
* Difference transformation (change from previous period)
* Detrending

**Caution:** Changes interpretation

#### Solution 4: Accept and Document

**When to use:**

* VIF between 5-10
* Both variables theoretically important
* Coefficients stable and significant

**Requirements:**

* Document in model notes
* Explain to stakeholders
* Monitor in future iterations
* Don't use for optimization (unreliable)

### Common Multicollinearity Scenarios

#### Scenario 1: Spend and Impressions

**Problem:** TV\_Spend and TV\_Impressions highly correlated

**Why:** More spend = more impressions

**VIF:** Both > 12

**Solution:**

* Choose spend (more actionable)
* Or combine using weighted variable
* Don't include both

#### Scenario 2: Related Channels

**Problem:** Facebook\_Spend and Instagram\_Spend move together (managed by same team)

**VIF:** Both = 9-11

**Solutions:**

* Combine into Social\_Spend\_WGTD
* Or keep separate if VIF < 10
* Test separately to isolate effects

#### Scenario 3: Original and Transformed

**Problem:** TV\_Spend and TV\_Spend\_adstock\_70 both included

**VIF:** Perfect multicollinearity

**Solution:**

* NEVER include both
* Choose one: raw OR adstocked
* Typically use adstocked version

#### Scenario 4: Seasonal Variables

**Problem:** 11 month dummies all correlated with each other

**VIF:** Moderate for all

**Solution:**

* This is expected and acceptable
* Month dummies MUST be used together
* VIF for group < 10 is acceptable

#### Scenario 5: Time Trends

**Problem:** Multiple variables growing over time

**VIF:** All high

**Solutions:**

* Add time trend variable
* Detrend variables
* Use differencing (period-over-period change)

### Testing for Multicollinearity

#### Pre-Addition Testing

**Before adding variable:**

1. Pre-test in Variable Testing
2. Check VIF in results
3. If VIF > 10, investigate

**Prevents:** Adding problematic variables

#### Post-Model Testing

**After building model:**

1. Navigate to Model Diagnostics
2. Run multicollinearity test
3. Review VIF for all variables
4. Address any VIF > 10

**Ensures:** Model stability

#### Iterative Testing

**As model evolves:**

1. Check VIF after each variable addition
2. Monitor for VIF increases
3. Address immediately
4. Maintain clean model

### Interpreting VIF Results

#### Individual VIF

**Focus on highest values first:**

* Sort by VIF descending
* Address VIF > 10 immediately
* Monitor VIF 5-10
* VIF < 5 is fine

#### Pattern Recognition

**All variables high VIF:**

* Suggests systemic issue
* Likely time trends
* Consider detrending approach

**Two variables high VIF:**

* Direct correlation between them
* Remove one or combine
* Most common scenario

**Increasing VIF over iterations:**

* New variables correlated with existing
* Need to be more selective
* Consider variable combinations

### Best Practices

#### During Model Building

**Check VIF regularly:**

* After each variable addition
* Before finalizing model
* When coefficients seem unstable

**Be proactive:**

* Pre-test variables for VIF
* Don't add high-VIF variables
* Address issues immediately

**Document decisions:**

* Which variables removed
* Why chosen over alternatives
* VIF values before and after

#### Variable Selection

**Avoid including:**

* Multiple measures of same thing
* Both raw and transformed
* Highly correlated channels without combining

**Prefer:**

* Independent predictors
* Orthogonal variables
* Combined weighted variables when needed

#### Stakeholder Communication

**Explain multicollinearity:**

* Use simple terms
* Explain why both can't be included
* Show instability without resolution

**Justify decisions:**

* Why variable A kept over B
* Statistical rationale (VIF, T-stat)
* Business rationale (actionability)

### Advanced Topics

#### Tolerance

**Tolerance = 1 / VIF**

**Interpretation:**

* Tolerance close to 1: Low multicollinearity
* Tolerance close to 0: High multicollinearity
* Same information as VIF, different scale

**Use:** Some prefer tolerance, same diagnostic

#### Condition Number

**Alternative measure:**

* Ratio of largest to smallest eigenvalue
* Condition number > 30 indicates multicollinearity
* More technical, less commonly used

#### Ridge Regression

**Advanced solution:**

* Adds penalty for correlated variables
* Can keep correlated variables
* Requires specialized techniques
* Not currently in MixModeler

### Troubleshooting

#### VIF calculation fails

**Cause:** Not enough observations or perfect multicollinearity

**Solution:**

* Check for duplicate variables
* Ensure sufficient data
* Remove obviously redundant variables

#### All VIF values are high

**Cause:** Time trends across all variables

**Solution:**

* Add time trend variable
* Consider first-differencing
* Detrend variables

#### VIF acceptable but model unstable

**Cause:** VIF isn't only diagnostic

**Solution:**

* Check other diagnostics
* Review coefficient signs
* Validate with business logic

### Key Takeaways

* VIF measures variance inflation due to multicollinearity
* VIF > 10 requires action, < 5 is acceptable
* Check VIF before adding variables (Variable Testing)
* Check VIF after building model (Diagnostics)
* Remove one variable, combine into weighted, or accept and document
* Common issues: spend vs impressions, related channels, trends
* Monitor VIF throughout model development
* Address multicollinearity for stable, reliable models


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://mixmodeler.gitbook.io/mixmodeler-docs/model-building/variable-testing/vif-and-multicollinearity.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
