Correlation Heatmaps
What Are Correlation Heatmaps?
Correlation heatmaps display pairwise correlations between all selected variables in a color-coded table format, making it easy to spot multicollinearity and variable relationships at a glance.
Purpose: Identify multicollinearity before modeling, find redundant variables, understand all variable relationships simultaneously.
When to Use
Best For:
- Pre-model multicollinearity check 
- Variable selection decisions 
- Understanding variable relationships 
- Identifying redundant variables 
- Detecting unexpected correlations 
Critical Use: Always run before finalizing model variables to avoid VIF issues.
How to Create
Steps:
- Select 2 or more variables (typically all candidate model variables) 
- Click "Correlation Matrix" button 
- Click "Calculate Correlation" 
- Wait for calculation (may use WebGPU/WASM acceleration if enabled) 
Calculation:
- Computes pairwise Pearson correlations 
- All variables vs. all variables 
- Results in symmetric matrix 
Reading the Matrix
Table Structure:
- Rows: Variable names 
- Columns: Same variable names 
- Cells: Correlation values (-1 to +1) 
- Diagonal: Always 1.0 (variable with itself) 
- Symmetric: Correlation A→B = B→A 
Color Coding:
Dark Red
> 0.8
Strong positive correlation
Light Red
0.5 to 0.8
Moderate positive
White/Light
-0.5 to 0.5
Low/no correlation
Blue
< -0.5
Negative correlation
Correlation Values
+1.0: Perfect positive correlation
- Variables move together perfectly 
- Redundant variables 
+0.8 to +1.0: Very strong positive
- Warning: Serious multicollinearity 
- Consider removing one variable 
+0.5 to +0.8: Moderate positive
- Some relationship 
- Monitor in model 
- May be acceptable 
-0.5 to +0.5: Low correlation
- Variables mostly independent 
- Good: No multicollinearity concern 
-0.5 to -1.0: Negative correlation
- Variables move in opposite directions 
- May be valid (Price vs. Volume) 
-1.0: Perfect negative correlation
- Exact opposite movement 
What Variables to Visualize
All Model Candidates:
- Include every variable you're considering for the model 
- Marketing spend variables 
- Control variables (price, promo, seasonality) 
- External factors 
Before Model Building:
Variables: 
- TV_Spend
- Radio_Spend  
- Digital_Spend
- Print_Spend
- Price_Index
- Holiday_Flag
- KPI (optional)
Purpose: Check for multicollinearity before fittingInterpreting Results
High Correlation (>0.8) Found:
Example:
TV_Spend and Radio_Spend: 0.92Action:
- These run together (coordinated campaigns) 
- Will cause multicollinearity in model 
- Options: - Remove one (keep more important) 
- Combine into "Traditional_Media" 
- Use regularization 
 
Moderate Correlation (0.6-0.8):
Example:
Digital_Spend and Social_Spend: 0.72Action:
- Some relationship but not severe 
- Monitor VIF in model 
- May be acceptable 
- Consider if both needed 
Low Correlation (<0.5):
Example:
TV_Spend and Price_Index: 0.23Action:
- ✓ Good - mostly independent 
- Both can be in model 
- No multicollinearity concern 
Common Patterns
Media Channel Clustering:
TV, Display, Video all correlate 0.75-0.85Why: Coordinated brand campaigns Solution: Combine into "Brand_Media" group
Digital Channels Together:
Search, Social, Display correlate 0.65-0.80Why: Digital budget managed together Solution: May keep separate or combine
Seasonal Variables:
All holiday dummies correlate with each otherWhy: Holidays often close together Solution: Use composite seasonal index
Use Cases
Pre-Model Check:
Goal: Clean variable set before modeling
Process:
1. Include all 15 candidate variables
2. Run correlation matrix
3. Find 3 pairs with correlation >0.8
4. Remove one from each pair
5. Final model has 12 variables, all VIF <5Variable Selection:
Question: TV or Radio - which to keep?
Matrix shows: TV and Radio correlation = 0.89
Decision factors:
- TV has more spend (larger impact)
- TV data quality better
- Keep TV, remove RadioUnexpected Relationships:
Discovery: Price and Digital_Spend correlate -0.65
Insight: Price cuts coincide with digital campaigns
Decision: Both important, negative correlation is meaningfulMulticollinearity Thresholds
Conservative Approach:
- Remove if correlation > 0.7 
- Strict VIF control 
- Safest for stable models 
Standard Approach (Recommended):
- Remove if correlation > 0.8 
- Balance between multicollinearity and completeness 
- Industry standard 
Liberal Approach:
- Only remove if correlation > 0.9 
- Accept some multicollinearity 
- Risk higher VIF 
After Correlation Matrix
Decision Process:
- Identify all pairs with correlation >0.8 
- For each pair, decide which to keep: - More theoretically important 
- Better data quality 
- Easier to measure 
 
- Remove redundant variables 
- Re-run matrix to verify improvement 
- Proceed to model building 
Alternative to Removal:
- Combine correlated variables 
- Example: TV + Radio → "Traditional_Media" 
- Preserves information 
- Reduces multicollinearity 
Interactive Features
Tooltip:
- Hover over cell 
- Shows exact correlation value 
- Variable names (row and column) 
Color Gradient:
- Visual quick scan 
- Red cells = investigate 
- White cells = good 
Tips
Include All Candidates:
- Don't pre-filter variables 
- Comprehensive check upfront 
- Saves time later 
Check Both Directions:
- Matrix is symmetric 
- Can read row-wise or column-wise 
- Same information 
Document Decisions:
- Note which variables removed 
- Rationale for choices 
- Keep for model documentation 
Re-Check After Changes:
- Correlations change when variables removed 
- Verify improvements 
- May need iteration 
Acceleration Note
WebGPU/WASM:
- If enabled, correlation calculation is accelerated 
- Much faster for large variable sets 
- See acceleration badge for status 
Fallback:
- Uses standard client-side calculation if acceleration unavailable 
- Still works, just slower 
Summary
Correlation Heatmaps Show:
- All pairwise variable correlations 
- Multicollinearity issues visually 
- Variable relationships at a glance 
- Which variables are redundant 
Critical step before model building to ensure clean, independent variable set and avoid VIF problems.
Last updated