Correlation Heatmaps

What Are Correlation Heatmaps?

Correlation heatmaps display pairwise correlations between all selected variables in a color-coded table format, making it easy to spot multicollinearity and variable relationships at a glance.

Purpose: Identify multicollinearity before modeling, find redundant variables, understand all variable relationships simultaneously.

When to Use

Best For:

Pre-model multicollinearity check
Variable selection decisions
Understanding variable relationships
Identifying redundant variables
Detecting unexpected correlations

Critical Use: Always run before finalizing model variables to avoid VIF issues.

How to Create

Steps:

Select 2 or more variables (typically all candidate model variables)
Click "Correlation Matrix" button
Click "Calculate Correlation"
Wait for calculation (may use WebGPU/WASM acceleration if enabled)

Calculation:

Computes pairwise Pearson correlations
All variables vs. all variables
Results in symmetric matrix

Reading the Matrix

Table Structure:

Rows: Variable names
Columns: Same variable names
Cells: Correlation values (-1 to +1)
Diagonal: Always 1.0 (variable with itself)
Symmetric: Correlation A→B = B→A

Color Coding:

Color

Correlation Range

Meaning

Dark Red

> 0.8

Strong positive correlation

Light Red

0.5 to 0.8

Moderate positive

White/Light

-0.5 to 0.5

Low/no correlation

Blue

< -0.5

Negative correlation

Correlation Values

+1.0: Perfect positive correlation

Variables move together perfectly
Redundant variables

+0.8 to +1.0: Very strong positive

Warning: Serious multicollinearity
Consider removing one variable

+0.5 to +0.8: Moderate positive

Some relationship
Monitor in model
May be acceptable

-0.5 to +0.5: Low correlation

Variables mostly independent
Good: No multicollinearity concern

-0.5 to -1.0: Negative correlation

Variables move in opposite directions
May be valid (Price vs. Volume)

-1.0: Perfect negative correlation

Exact opposite movement

What Variables to Visualize

All Model Candidates:

Include every variable you're considering for the model
Marketing spend variables
Control variables (price, promo, seasonality)
External factors

Before Model Building:

Variables: 
- TV_Spend
- Radio_Spend  
- Digital_Spend
- Print_Spend
- Price_Index
- Holiday_Flag
- KPI (optional)

Purpose: Check for multicollinearity before fitting

Interpreting Results

High Correlation (>0.8) Found:

Example:

TV_Spend and Radio_Spend: 0.92

Action:

These run together (coordinated campaigns)
Will cause multicollinearity in model
Options:
- Remove one (keep more important)
- Combine into "Traditional_Media"
- Use regularization

Moderate Correlation (0.6-0.8):

Example:

Digital_Spend and Social_Spend: 0.72

Action:

Some relationship but not severe
Monitor VIF in model
May be acceptable
Consider if both needed

Low Correlation (<0.5):

Example:

TV_Spend and Price_Index: 0.23

Action:

✓ Good - mostly independent
Both can be in model
No multicollinearity concern

Common Patterns

Media Channel Clustering:

TV, Display, Video all correlate 0.75-0.85

Why: Coordinated brand campaigns Solution: Combine into "Brand_Media" group

Digital Channels Together:

Search, Social, Display correlate 0.65-0.80

Why: Digital budget managed together Solution: May keep separate or combine

Seasonal Variables:

All holiday dummies correlate with each other

Why: Holidays often close together Solution: Use composite seasonal index

Use Cases

Pre-Model Check:

Goal: Clean variable set before modeling
Process:
1. Include all 15 candidate variables
2. Run correlation matrix
3. Find 3 pairs with correlation >0.8
4. Remove one from each pair
5. Final model has 12 variables, all VIF <5

Variable Selection:

Question: TV or Radio - which to keep?
Matrix shows: TV and Radio correlation = 0.89
Decision factors:
- TV has more spend (larger impact)
- TV data quality better
- Keep TV, remove Radio

Unexpected Relationships:

Discovery: Price and Digital_Spend correlate -0.65
Insight: Price cuts coincide with digital campaigns
Decision: Both important, negative correlation is meaningful

Multicollinearity Thresholds

Conservative Approach:

Remove if correlation > 0.7
Strict VIF control
Safest for stable models

Standard Approach (Recommended):

Remove if correlation > 0.8
Balance between multicollinearity and completeness
Industry standard

Liberal Approach:

Only remove if correlation > 0.9
Accept some multicollinearity
Risk higher VIF

After Correlation Matrix

Decision Process:

Identify all pairs with correlation >0.8
For each pair, decide which to keep:
- More theoretically important
- Better data quality
- Easier to measure
Remove redundant variables
Re-run matrix to verify improvement
Proceed to model building

Alternative to Removal:

Combine correlated variables
Example: TV + Radio → "Traditional_Media"
Preserves information
Reduces multicollinearity

Interactive Features

Tooltip:

Hover over cell
Shows exact correlation value
Variable names (row and column)

Color Gradient:

Visual quick scan
Red cells = investigate
White cells = good

Tips

Include All Candidates:

Don't pre-filter variables
Comprehensive check upfront
Saves time later

Check Both Directions:

Matrix is symmetric
Can read row-wise or column-wise
Same information

Document Decisions:

Note which variables removed
Rationale for choices
Keep for model documentation

Re-Check After Changes:

Correlations change when variables removed
Verify improvements
May need iteration

Acceleration Note

WebGPU/WASM:

If enabled, correlation calculation is accelerated
Much faster for large variable sets
See acceleration badge for status

Fallback:

Uses standard client-side calculation if acceleration unavailable
Still works, just slower

Summary

Correlation Heatmaps Show:

All pairwise variable correlations
Multicollinearity issues visually
Variable relationships at a glance
Which variables are redundant

Critical step before model building to ensure clean, independent variable set and avoid VIF problems.

PreviousStacked Charts NextInteractive Features

Last updated 26 days ago