Correlation Heatmaps
What Are Correlation Heatmaps?
Correlation heatmaps display pairwise correlations between all selected variables in a color-coded table format, making it easy to spot multicollinearity and variable relationships at a glance.
Purpose: Identify multicollinearity before modeling, find redundant variables, understand all variable relationships simultaneously.
When to Use
Best For:
Pre-model multicollinearity check
Variable selection decisions
Understanding variable relationships
Identifying redundant variables
Detecting unexpected correlations
Critical Use: Always run before finalizing model variables to avoid VIF issues.
How to Create
Steps:
Select 2 or more variables (typically all candidate model variables)
Click "Correlation Matrix" button
Click "Calculate Correlation"
Wait for calculation (may use WebGPU/WASM acceleration if enabled)
Calculation:
Computes pairwise Pearson correlations
All variables vs. all variables
Results in symmetric matrix
Reading the Matrix
Table Structure:
Rows: Variable names
Columns: Same variable names
Cells: Correlation values (-1 to +1)
Diagonal: Always 1.0 (variable with itself)
Symmetric: Correlation A→B = B→A
Color Coding:
Dark Red
> 0.8
Strong positive correlation
Light Red
0.5 to 0.8
Moderate positive
White/Light
-0.5 to 0.5
Low/no correlation
Blue
< -0.5
Negative correlation
Correlation Values
+1.0: Perfect positive correlation
Variables move together perfectly
Redundant variables
+0.8 to +1.0: Very strong positive
Warning: Serious multicollinearity
Consider removing one variable
+0.5 to +0.8: Moderate positive
Some relationship
Monitor in model
May be acceptable
-0.5 to +0.5: Low correlation
Variables mostly independent
Good: No multicollinearity concern
-0.5 to -1.0: Negative correlation
Variables move in opposite directions
May be valid (Price vs. Volume)
-1.0: Perfect negative correlation
Exact opposite movement
What Variables to Visualize
All Model Candidates:
Include every variable you're considering for the model
Marketing spend variables
Control variables (price, promo, seasonality)
External factors
Before Model Building:
Interpreting Results
High Correlation (>0.8) Found:
Example:
Action:
These run together (coordinated campaigns)
Will cause multicollinearity in model
Options:
Remove one (keep more important)
Combine into "Traditional_Media"
Use regularization
Moderate Correlation (0.6-0.8):
Example:
Action:
Some relationship but not severe
Monitor VIF in model
May be acceptable
Consider if both needed
Low Correlation (<0.5):
Example:
Action:
✓ Good - mostly independent
Both can be in model
No multicollinearity concern
Common Patterns
Media Channel Clustering:
Why: Coordinated brand campaigns Solution: Combine into "Brand_Media" group
Digital Channels Together:
Why: Digital budget managed together Solution: May keep separate or combine
Seasonal Variables:
Why: Holidays often close together Solution: Use composite seasonal index
Use Cases
Pre-Model Check:
Variable Selection:
Unexpected Relationships:
Multicollinearity Thresholds
Conservative Approach:
Remove if correlation > 0.7
Strict VIF control
Safest for stable models
Standard Approach (Recommended):
Remove if correlation > 0.8
Balance between multicollinearity and completeness
Industry standard
Liberal Approach:
Only remove if correlation > 0.9
Accept some multicollinearity
Risk higher VIF
After Correlation Matrix
Decision Process:
Identify all pairs with correlation >0.8
For each pair, decide which to keep:
More theoretically important
Better data quality
Easier to measure
Remove redundant variables
Re-run matrix to verify improvement
Proceed to model building
Alternative to Removal:
Combine correlated variables
Example: TV + Radio → "Traditional_Media"
Preserves information
Reduces multicollinearity
Interactive Features
Tooltip:
Hover over cell
Shows exact correlation value
Variable names (row and column)
Color Gradient:
Visual quick scan
Red cells = investigate
White cells = good
Tips
Include All Candidates:
Don't pre-filter variables
Comprehensive check upfront
Saves time later
Check Both Directions:
Matrix is symmetric
Can read row-wise or column-wise
Same information
Document Decisions:
Note which variables removed
Rationale for choices
Keep for model documentation
Re-Check After Changes:
Correlations change when variables removed
Verify improvements
May need iteration
Acceleration Note
WebGPU/WASM:
If enabled, correlation calculation is accelerated
Much faster for large variable sets
See acceleration badge for status
Fallback:
Uses standard client-side calculation if acceleration unavailable
Still works, just slower
Summary
Correlation Heatmaps Show:
All pairwise variable correlations
Multicollinearity issues visually
Variable relationships at a glance
Which variables are redundant
Critical step before model building to ensure clean, independent variable set and avoid VIF problems.
Last updated