Convergence Diagnostics

Overview

Convergence diagnostics assess whether MCMC sampling has successfully explored the posterior distribution. Before trusting Bayesian results, you must verify that chains have converged to a stable distribution. Poor convergence means unreliable estimates and invalid credible intervals.

MixModeler automatically calculates comprehensive diagnostics and provides clear guidance on whether results are trustworthy.

Why Convergence Matters

The Problem

MCMC chains start at random locations and gradually explore the parameter space. Early samples may not represent the true posterior distribution. Convergence occurs when chains:

Stabilize at the correct distribution
Forget their starting positions
Produce consistent estimates regardless of initialization

The Risk

Using non-converged results leads to:

Incorrect coefficient estimates
Invalid credible intervals
Wrong business decisions
Misleading uncertainty quantification

The Solution

Convergence diagnostics detect these issues before you interpret results, ensuring your Bayesian inference is reliable.

Key Diagnostic Metrics

R-hat (Gelman-Rubin Statistic)

The primary convergence diagnostic comparing between-chain and within-chain variance.

What It Measures: Whether multiple chains have converged to the same distribution

Interpretation:

R-hat ≈ 1.00: Perfect convergence, chains agree
R-hat < 1.01: Excellent convergence, safe to proceed
R-hat < 1.05: Acceptable convergence, proceed with caution
R-hat > 1.05: Poor convergence, do not trust results

How It Works: Compares variance between chains to variance within chains. If chains haven't converged, between-chain variance will be larger than within-chain variance.

Rule of Thumb: R-hat must be < 1.01 for all parameters before interpreting results.

Effective Sample Size (ESS)

Number of independent samples providing equivalent information to your correlated MCMC samples.

What It Measures: How much independent information your samples contain after accounting for autocorrelation

Types:

ESS Bulk: Effective sample size for the central distribution (mean, median)
ESS Tail: Effective sample size for the distribution tails (quantiles, extremes)

Interpretation:

ESS > 400: Sufficient for reliable estimates
ESS 200-400: Adequate but consider more samples
ESS 100-200: Minimal, increase draws
ESS < 100: Insufficient, results unreliable

Total vs Effective Samples: You might have 8,000 total samples (4 chains × 2,000 draws) but ESS of 800 due to autocorrelation. This means your samples contain equivalent information to 800 independent draws.

Rule of Thumb: ESS > 400 for both bulk and tail ensures reliable credible intervals and probability estimates.

ESS Percentage

ESS as a percentage of total samples.

Calculation: (ESS / Total Samples) × 100

Interpretation:

>50%: Excellent mixing, very efficient sampling
25-50%: Good mixing, efficient sampling
10-25%: Moderate mixing, acceptable
<10%: Poor mixing, samples highly autocorrelated

Ideal: ESS percentage above 25% indicates efficient exploration without excessive autocorrelation.

Divergent Transitions

Failed sampling steps where the algorithm couldn't accurately explore the posterior.

What It Measures: Regions of the posterior that are difficult to explore, often high-curvature areas

Interpretation:

0 divergences: Perfect, no exploration failures
1-10 divergences (<1% of samples): Minimal, usually acceptable
10-100 divergences (1-10%): Concerning, investigate
>100 divergences (>10%): Severe problem, results unreliable

Causes:

Complex posterior geometries
Highly correlated parameters
Poor model specification
Conflicting priors and data

Rule of Thumb: Aim for zero divergences. Any divergences warrant investigation, though <1% is often acceptable.

Accessing Diagnostics in MixModeler

Automatic Calculation

After Bayesian model runs, MixModeler automatically:

Calculates all convergence diagnostics
Flags any parameters with issues
Provides an overall convergence assessment
Offers specific recommendations

Diagnostics Panel

To view detailed diagnostics:

Open your Bayesian model results
Click Convergence Diagnostics tab
Review overall assessment banner
Examine parameter-by-parameter details

Overall Assessment

MixModeler provides a summary judgment:

Excellent: All diagnostics pass strict thresholds

R-hat < 1.01 for all parameters
ESS > 400 for all parameters
Zero or negligible divergences
Action: Safe to interpret and use results

Good: Diagnostics meet acceptable thresholds

R-hat < 1.05 for all parameters
ESS > 200 for most parameters
Few divergences (<1%)
Action: Proceed but note any flagged parameters

Acceptable: Some diagnostic concerns

R-hat < 1.1 for most parameters
ESS > 100 for most parameters
Moderate divergences (1-5%)
Action: Interpret cautiously, consider rerunning with adjusted settings

Poor: Serious convergence issues

R-hat > 1.1 for some parameters
ESS < 100 for some parameters
Many divergences (>5%)
Action: Do not trust results, must rerun with different settings

Parameter-Level Diagnostics

Diagnostic Table

For each model parameter, view:

Parameter

R-hat

ESS Bulk

ESS Tail

ESS %

Status

TV_Advertising

1.00

1,250

1,180

15.6%

✓ Pass

Digital_Marketing

1.01

980

920

12.3%

✓ Pass

Print_Media

1.08

320

280

4.0%

⚠ Warning

Radio

1.00

1,450

1,380

18.1%

✓ Pass

Interpretation: Print_Media shows poor convergence (high R-hat, low ESS), while other parameters converged well.

Flagged Parameters

Parameters failing diagnostic thresholds are automatically highlighted:

Red Flag (Critical): R-hat > 1.05 or ESS < 200

Do not interpret this parameter
Rerun model with adjusted settings

Yellow Flag (Warning): R-hat 1.01-1.05 or ESS 200-400

Interpret cautiously
Consider increasing draws
Check for model specification issues

Visual Diagnostics

Trace Plots: Show parameter values over MCMC iterations for each chain

Good Trace:

Chains overlap completely
Random scatter around mean (white noise)
No trends or patterns
Chains indistinguishable from each other

Bad Trace:

Chains separated or drifting
Trends up or down
Chains don't overlap
One chain exploring different region than others

Troubleshooting Non-Convergence

Issue 1: High R-hat Values

Symptom: R-hat > 1.05 for some parameters

Diagnosis: Chains haven't converged to same distribution

Solutions (in order of preference):

Increase draws: Double from 2,000 to 4,000
- Allows more time for convergence
- Often solves mild convergence issues
Increase tuning: Increase from 1,000 to 2,000
- Better step size adaptation
- Improves exploration efficiency
Add more chains: Increase from 4 to 6 or 8
- More chains make R-hat more sensitive
- Better diagnosis of convergence
Check model specification:
- Remove highly correlated variables
- Simplify adstock transformations
- Review prior specifications

Example Fix:

Before: Chains=4, Draws=2,000, R-hat=1.08
After: Chains=4, Draws=4,000, R-hat=1.00

Issue 2: Low Effective Sample Size

Symptom: ESS < 400 (especially ESS < 200)

Diagnosis: High autocorrelation in samples, inefficient exploration

Solutions:

Increase draws: More samples compensates for correlation
- Double draws from 2,000 to 4,000
- ESS typically increases proportionally
Increase tuning: Better adaptation reduces autocorrelation
- Increase from 1,000 to 1,500+
- Allows algorithm to learn better step sizes
Simplify model: Remove redundant variables
- Highly correlated predictors increase autocorrelation
- Simpler models mix better
Check priors: Very strong priors can restrict movement
- Loosen prior standard deviations
- Ensure priors don't conflict with data

Example Fix:

Before: Draws=2,000, ESS Bulk=180
After: Draws=4,000, ESS Bulk=420

Issue 3: Divergent Transitions

Symptom: Warning about divergences, diagnostic shows >10 divergences

Diagnosis: Sampler struggling with posterior geometry

Solutions:

Increase target_accept: From 0.95 to 0.98 or 0.99
- Takes smaller, more careful steps
- Better handles complex geometries
- Most effective solution for divergences
Increase tuning: From 1,000 to 2,000
- More time to adapt to posterior shape
- Often reduces divergences significantly
Check prior-data conflicts:
- Review if strong priors conflict with data
- Loosen priors if appropriate
- Ensure priors are on correct scale
Reparameterize model:
- Remove highly correlated variables
- Use standardized variables
- Simplify transformations

Example Fix:

Before: Target Accept=0.95, Divergences=45
After: Target Accept=0.99, Divergences=0

Issue 4: Chains Haven't Mixed

Symptom: Trace plots show chains in different regions

Diagnosis: Chains stuck in different modes or starting positions

Solutions:

Increase draws dramatically: 2,000 → 5,000+ draws
- Give chains time to find and converge to correct distribution
Increase tuning: 1,000 → 2,000+ steps
- More adaptation time helps chains escape local regions
Check for multimodality:
- Posterior might have multiple modes
- Review model specification
- May indicate identification problems
Verify priors are reasonable:
- Check if priors push chains to different regions
- Ensure priors don't create artificial modes

Issue 5: All Parameters Show Issues

Symptom: Everything has high R-hat and low ESS

Diagnosis: Fundamental model or setting problem

Solutions:

Start fresh with simpler model:
- Begin with fewer variables
- Remove transformations temporarily
- Use default priors
Check data quality:
- Verify no missing values or NaNs
- Ensure reasonable variable scales
- Look for extreme outliers
Review MCMC settings:
- Ensure settings weren't accidentally reduced
- Verify target_accept is at least 0.90
- Confirm draws and tuning are adequate
Try Fast Inference mode first:
- Run with SVI to see if model structure is viable
- If SVI fails, likely a fundamental model issue

Interpreting Trace Plots

Reading Trace Plots

Trace plots show parameter values across MCMC iterations for each chain (typically color-coded).

Horizontal Axis: MCMC iteration number Vertical Axis: Parameter value Colors: Different chains

Good Traces

Characteristics of well-converged chains:

✓ Stationary: No trends, drifts, or patterns - just random fluctuation around a stable mean

✓ Overlapping: All chains explore the same region, completely overlapping

✓ Fat Hairy Caterpillar: Random, noisy appearance - looks like dense hair

✓ Forgetful: Can't distinguish where each chain started

Example Description: "Chains quickly converge and mix, appearing as an indistinguishable fuzzy band across iterations."

Bad Traces

Warning signs of non-convergence:

✗ Separated Chains: Chains exploring different regions without overlap

✗ Trends: Chains drifting up or down over iterations

✗ Slow Mixing: Chains changing slowly, showing long-term patterns

✗ Stuck Chains: One or more chains not moving or stuck in region

✗ Late Convergence: Chains only converging near the end

Example Description: "Chains remain separated throughout, one exploring higher values than others."

Taking Action

If Most Traces Look Good: A few problematic parameters often improve with more draws. Focus on fixing just those parameters.

If Many Traces Look Bad: Fundamental issue with model or settings. Review model specification, increase tuning, or simplify model.

Best Practices

Always Check Diagnostics: Never interpret Bayesian results without first reviewing convergence diagnostics. This is non-negotiable.

Use Multiple Diagnostics: Don't rely on just R-hat. Consider ESS, divergences, and visual inspection together.

Set Clear Thresholds: Establish R-hat < 1.01 and ESS > 400 as your standards. Don't compromise without documented justification.

Start Conservative: Use standard or high-quality MCMC settings from the start. It's faster to sample well once than to iterate multiple times.

Investigate, Don't Ignore: If diagnostics flag a parameter, understand why before proceeding. Don't just rerun hoping it improves.

Document Issues: Record any convergence problems and how you resolved them. This helps with reproducibility and learning.

Visual Confirmation: Even with good numeric diagnostics, always glance at trace plots for visual confirmation.

Rerun When Needed: Don't be afraid to rerun models. It's better to spend an extra 5 minutes sampling than to make business decisions on unreliable estimates.

Common Misconceptions

Myth 1: "If R-hat < 1.05, results are reliable" Reality: While 1.05 is often cited, aim for R-hat < 1.01 for high-confidence results. Use 1.05 only as a bare minimum.

Myth 2: "More samples always improve convergence" Reality: More samples help if chains are already converging. If chains are stuck or diverging, more samples won't fix fundamental issues.

Myth 3: "Divergences just mean the model is complex" Reality: While complexity can cause divergences, they always indicate the posterior isn't being fully explored. Address them, don't ignore them.

Myth 4: "Low ESS means the estimates are wrong" Reality: Low ESS means less precision, not necessarily bias. Increase draws to improve ESS while maintaining accurate central estimates.

Myth 5: "Convergence diagnostics are optional for simple models" Reality: Even simple models can have convergence issues. Always check diagnostics regardless of model complexity.

Advanced Diagnostics

Monte Carlo Standard Error (MCSE)

Standard error of the posterior mean estimate due to using finite MCMC samples.

Rule of Thumb: MCSE should be < 5% of posterior standard deviation

If Too Large: Increase draws to reduce Monte Carlo error

Autocorrelation

Correlation between samples at different lags.

Ideal: Rapid decay to zero within 10-20 lags Problem: High autocorrelation persisting across many lags

PreviousCredible Intervals NextAcceleration Features

Last updated 25 days ago