Correlation Heatmaps

What Are Correlation Heatmaps?

Correlation heatmaps display pairwise correlations between all selected variables in a color-coded table format, making it easy to spot multicollinearity and variable relationships at a glance.

Purpose: Identify multicollinearity before modeling, find redundant variables, understand all variable relationships simultaneously.

When to Use

Best For:

  • Pre-model multicollinearity check

  • Variable selection decisions

  • Understanding variable relationships

  • Identifying redundant variables

  • Detecting unexpected correlations

Critical Use: Always run before finalizing model variables to avoid VIF issues.

How to Create

Steps:

  1. Select 2 or more variables (typically all candidate model variables)

  2. Click "Correlation Matrix" button

  3. Click "Calculate Correlation"

  4. Wait for calculation (may use WebGPU/WASM acceleration if enabled)

Calculation:

  • Computes pairwise Pearson correlations

  • All variables vs. all variables

  • Results in symmetric matrix

Reading the Matrix

Table Structure:

  • Rows: Variable names

  • Columns: Same variable names

  • Cells: Correlation values (-1 to +1)

  • Diagonal: Always 1.0 (variable with itself)

  • Symmetric: Correlation A→B = B→A

Color Coding:

Color
Correlation Range
Meaning

Dark Red

> 0.8

Strong positive correlation

Light Red

0.5 to 0.8

Moderate positive

White/Light

-0.5 to 0.5

Low/no correlation

Blue

< -0.5

Negative correlation

Correlation Values

+1.0: Perfect positive correlation

  • Variables move together perfectly

  • Redundant variables

+0.8 to +1.0: Very strong positive

  • Warning: Serious multicollinearity

  • Consider removing one variable

+0.5 to +0.8: Moderate positive

  • Some relationship

  • Monitor in model

  • May be acceptable

-0.5 to +0.5: Low correlation

  • Variables mostly independent

  • Good: No multicollinearity concern

-0.5 to -1.0: Negative correlation

  • Variables move in opposite directions

  • May be valid (Price vs. Volume)

-1.0: Perfect negative correlation

  • Exact opposite movement

What Variables to Visualize

All Model Candidates:

  • Include every variable you're considering for the model

  • Marketing spend variables

  • Control variables (price, promo, seasonality)

  • External factors

Before Model Building:

Variables: 
- TV_Spend
- Radio_Spend  
- Digital_Spend
- Print_Spend
- Price_Index
- Holiday_Flag
- KPI (optional)

Purpose: Check for multicollinearity before fitting

Interpreting Results

High Correlation (>0.8) Found:

Example:

TV_Spend and Radio_Spend: 0.92

Action:

  1. These run together (coordinated campaigns)

  2. Will cause multicollinearity in model

  3. Options:

    • Remove one (keep more important)

    • Combine into "Traditional_Media"

    • Use regularization

Moderate Correlation (0.6-0.8):

Example:

Digital_Spend and Social_Spend: 0.72

Action:

  1. Some relationship but not severe

  2. Monitor VIF in model

  3. May be acceptable

  4. Consider if both needed

Low Correlation (<0.5):

Example:

TV_Spend and Price_Index: 0.23

Action:

  1. ✓ Good - mostly independent

  2. Both can be in model

  3. No multicollinearity concern

Common Patterns

Media Channel Clustering:

TV, Display, Video all correlate 0.75-0.85

Why: Coordinated brand campaigns Solution: Combine into "Brand_Media" group

Digital Channels Together:

Search, Social, Display correlate 0.65-0.80

Why: Digital budget managed together Solution: May keep separate or combine

Seasonal Variables:

All holiday dummies correlate with each other

Why: Holidays often close together Solution: Use composite seasonal index

Use Cases

Pre-Model Check:

Goal: Clean variable set before modeling
Process:
1. Include all 15 candidate variables
2. Run correlation matrix
3. Find 3 pairs with correlation >0.8
4. Remove one from each pair
5. Final model has 12 variables, all VIF <5

Variable Selection:

Question: TV or Radio - which to keep?
Matrix shows: TV and Radio correlation = 0.89
Decision factors:
- TV has more spend (larger impact)
- TV data quality better
- Keep TV, remove Radio

Unexpected Relationships:

Discovery: Price and Digital_Spend correlate -0.65
Insight: Price cuts coincide with digital campaigns
Decision: Both important, negative correlation is meaningful

Multicollinearity Thresholds

Conservative Approach:

  • Remove if correlation > 0.7

  • Strict VIF control

  • Safest for stable models

Standard Approach (Recommended):

  • Remove if correlation > 0.8

  • Balance between multicollinearity and completeness

  • Industry standard

Liberal Approach:

  • Only remove if correlation > 0.9

  • Accept some multicollinearity

  • Risk higher VIF

After Correlation Matrix

Decision Process:

  1. Identify all pairs with correlation >0.8

  2. For each pair, decide which to keep:

    • More theoretically important

    • Better data quality

    • Easier to measure

  3. Remove redundant variables

  4. Re-run matrix to verify improvement

  5. Proceed to model building

Alternative to Removal:

  • Combine correlated variables

  • Example: TV + Radio → "Traditional_Media"

  • Preserves information

  • Reduces multicollinearity

Interactive Features

Tooltip:

  • Hover over cell

  • Shows exact correlation value

  • Variable names (row and column)

Color Gradient:

  • Visual quick scan

  • Red cells = investigate

  • White cells = good

Tips

Include All Candidates:

  • Don't pre-filter variables

  • Comprehensive check upfront

  • Saves time later

Check Both Directions:

  • Matrix is symmetric

  • Can read row-wise or column-wise

  • Same information

Document Decisions:

  • Note which variables removed

  • Rationale for choices

  • Keep for model documentation

Re-Check After Changes:

  • Correlations change when variables removed

  • Verify improvements

  • May need iteration

Acceleration Note

WebGPU/WASM:

  • If enabled, correlation calculation is accelerated

  • Much faster for large variable sets

  • See acceleration badge for status

Fallback:

  • Uses standard client-side calculation if acceleration unavailable

  • Still works, just slower

Summary

Correlation Heatmaps Show:

  • All pairwise variable correlations

  • Multicollinearity issues visually

  • Variable relationships at a glance

  • Which variables are redundant

Critical step before model building to ensure clean, independent variable set and avoid VIF problems.

Last updated