Convergence Diagnostics
Overview
Convergence diagnostics assess whether MCMC sampling has successfully explored the posterior distribution. Before trusting Bayesian results, you must verify that chains have converged to a stable distribution. Poor convergence means unreliable estimates and invalid credible intervals.
MixModeler automatically calculates comprehensive diagnostics and provides clear guidance on whether results are trustworthy.
Why Convergence Matters
The Problem
MCMC chains start at random locations and gradually explore the parameter space. Early samples may not represent the true posterior distribution. Convergence occurs when chains:
Stabilize at the correct distribution
Forget their starting positions
Produce consistent estimates regardless of initialization
The Risk
Using non-converged results leads to:
Incorrect coefficient estimates
Invalid credible intervals
Wrong business decisions
Misleading uncertainty quantification
The Solution
Convergence diagnostics detect these issues before you interpret results, ensuring your Bayesian inference is reliable.
Key Diagnostic Metrics
R-hat (Gelman-Rubin Statistic)
The primary convergence diagnostic comparing between-chain and within-chain variance.
What It Measures: Whether multiple chains have converged to the same distribution
Interpretation:
R-hat ≈ 1.00: Perfect convergence, chains agree
R-hat < 1.01: Excellent convergence, safe to proceed
R-hat < 1.05: Acceptable convergence, proceed with caution
R-hat > 1.05: Poor convergence, do not trust results
How It Works: Compares variance between chains to variance within chains. If chains haven't converged, between-chain variance will be larger than within-chain variance.
Rule of Thumb: R-hat must be < 1.01 for all parameters before interpreting results.
Effective Sample Size (ESS)
Number of independent samples providing equivalent information to your correlated MCMC samples.
What It Measures: How much independent information your samples contain after accounting for autocorrelation
Types:
ESS Bulk: Effective sample size for the central distribution (mean, median)
ESS Tail: Effective sample size for the distribution tails (quantiles, extremes)
Interpretation:
ESS > 400: Sufficient for reliable estimates
ESS 200-400: Adequate but consider more samples
ESS 100-200: Minimal, increase draws
ESS < 100: Insufficient, results unreliable
Total vs Effective Samples: You might have 8,000 total samples (4 chains × 2,000 draws) but ESS of 800 due to autocorrelation. This means your samples contain equivalent information to 800 independent draws.
Rule of Thumb: ESS > 400 for both bulk and tail ensures reliable credible intervals and probability estimates.
ESS Percentage
ESS as a percentage of total samples.
Calculation: (ESS / Total Samples) × 100
Interpretation:
>50%: Excellent mixing, very efficient sampling
25-50%: Good mixing, efficient sampling
10-25%: Moderate mixing, acceptable
<10%: Poor mixing, samples highly autocorrelated
Ideal: ESS percentage above 25% indicates efficient exploration without excessive autocorrelation.
Divergent Transitions
Failed sampling steps where the algorithm couldn't accurately explore the posterior.
What It Measures: Regions of the posterior that are difficult to explore, often high-curvature areas
Interpretation:
0 divergences: Perfect, no exploration failures
1-10 divergences (<1% of samples): Minimal, usually acceptable
10-100 divergences (1-10%): Concerning, investigate
>100 divergences (>10%): Severe problem, results unreliable
Causes:
Complex posterior geometries
Highly correlated parameters
Poor model specification
Conflicting priors and data
Rule of Thumb: Aim for zero divergences. Any divergences warrant investigation, though <1% is often acceptable.
Accessing Diagnostics in MixModeler
Automatic Calculation
After Bayesian model runs, MixModeler automatically:
Calculates all convergence diagnostics
Flags any parameters with issues
Provides an overall convergence assessment
Offers specific recommendations
Diagnostics Panel
To view detailed diagnostics:
Open your Bayesian model results
Click Convergence Diagnostics tab
Review overall assessment banner
Examine parameter-by-parameter details
Overall Assessment
MixModeler provides a summary judgment:
Excellent: All diagnostics pass strict thresholds
R-hat < 1.01 for all parameters
ESS > 400 for all parameters
Zero or negligible divergences
Action: Safe to interpret and use results
Good: Diagnostics meet acceptable thresholds
R-hat < 1.05 for all parameters
ESS > 200 for most parameters
Few divergences (<1%)
Action: Proceed but note any flagged parameters
Acceptable: Some diagnostic concerns
R-hat < 1.1 for most parameters
ESS > 100 for most parameters
Moderate divergences (1-5%)
Action: Interpret cautiously, consider rerunning with adjusted settings
Poor: Serious convergence issues
R-hat > 1.1 for some parameters
ESS < 100 for some parameters
Many divergences (>5%)
Action: Do not trust results, must rerun with different settings
Parameter-Level Diagnostics
Diagnostic Table
For each model parameter, view:
TV_Advertising
1.00
1,250
1,180
15.6%
✓ Pass
Digital_Marketing
1.01
980
920
12.3%
✓ Pass
Print_Media
1.08
320
280
4.0%
⚠ Warning
Radio
1.00
1,450
1,380
18.1%
✓ Pass
Interpretation: Print_Media shows poor convergence (high R-hat, low ESS), while other parameters converged well.
Flagged Parameters
Parameters failing diagnostic thresholds are automatically highlighted:
Red Flag (Critical): R-hat > 1.05 or ESS < 200
Do not interpret this parameter
Rerun model with adjusted settings
Yellow Flag (Warning): R-hat 1.01-1.05 or ESS 200-400
Interpret cautiously
Consider increasing draws
Check for model specification issues
Visual Diagnostics
Trace Plots: Show parameter values over MCMC iterations for each chain
Good Trace:
Chains overlap completely
Random scatter around mean (white noise)
No trends or patterns
Chains indistinguishable from each other
Bad Trace:
Chains separated or drifting
Trends up or down
Chains don't overlap
One chain exploring different region than others
Troubleshooting Non-Convergence
Issue 1: High R-hat Values
Symptom: R-hat > 1.05 for some parameters
Diagnosis: Chains haven't converged to same distribution
Solutions (in order of preference):
Increase draws: Double from 2,000 to 4,000
Allows more time for convergence
Often solves mild convergence issues
Increase tuning: Increase from 1,000 to 2,000
Better step size adaptation
Improves exploration efficiency
Add more chains: Increase from 4 to 6 or 8
More chains make R-hat more sensitive
Better diagnosis of convergence
Check model specification:
Remove highly correlated variables
Simplify adstock transformations
Review prior specifications
Example Fix:
Issue 2: Low Effective Sample Size
Symptom: ESS < 400 (especially ESS < 200)
Diagnosis: High autocorrelation in samples, inefficient exploration
Solutions:
Increase draws: More samples compensates for correlation
Double draws from 2,000 to 4,000
ESS typically increases proportionally
Increase tuning: Better adaptation reduces autocorrelation
Increase from 1,000 to 1,500+
Allows algorithm to learn better step sizes
Simplify model: Remove redundant variables
Highly correlated predictors increase autocorrelation
Simpler models mix better
Check priors: Very strong priors can restrict movement
Loosen prior standard deviations
Ensure priors don't conflict with data
Example Fix:
Issue 3: Divergent Transitions
Symptom: Warning about divergences, diagnostic shows >10 divergences
Diagnosis: Sampler struggling with posterior geometry
Solutions:
Increase target_accept: From 0.95 to 0.98 or 0.99
Takes smaller, more careful steps
Better handles complex geometries
Most effective solution for divergences
Increase tuning: From 1,000 to 2,000
More time to adapt to posterior shape
Often reduces divergences significantly
Check prior-data conflicts:
Review if strong priors conflict with data
Loosen priors if appropriate
Ensure priors are on correct scale
Reparameterize model:
Remove highly correlated variables
Use standardized variables
Simplify transformations
Example Fix:
Issue 4: Chains Haven't Mixed
Symptom: Trace plots show chains in different regions
Diagnosis: Chains stuck in different modes or starting positions
Solutions:
Increase draws dramatically: 2,000 → 5,000+ draws
Give chains time to find and converge to correct distribution
Increase tuning: 1,000 → 2,000+ steps
More adaptation time helps chains escape local regions
Check for multimodality:
Posterior might have multiple modes
Review model specification
May indicate identification problems
Verify priors are reasonable:
Check if priors push chains to different regions
Ensure priors don't create artificial modes
Issue 5: All Parameters Show Issues
Symptom: Everything has high R-hat and low ESS
Diagnosis: Fundamental model or setting problem
Solutions:
Start fresh with simpler model:
Begin with fewer variables
Remove transformations temporarily
Use default priors
Check data quality:
Verify no missing values or NaNs
Ensure reasonable variable scales
Look for extreme outliers
Review MCMC settings:
Ensure settings weren't accidentally reduced
Verify target_accept is at least 0.90
Confirm draws and tuning are adequate
Try Fast Inference mode first:
Run with SVI to see if model structure is viable
If SVI fails, likely a fundamental model issue
Interpreting Trace Plots
Reading Trace Plots
Trace plots show parameter values across MCMC iterations for each chain (typically color-coded).
Horizontal Axis: MCMC iteration number Vertical Axis: Parameter value Colors: Different chains
Good Traces
Characteristics of well-converged chains:
✓ Stationary: No trends, drifts, or patterns - just random fluctuation around a stable mean
✓ Overlapping: All chains explore the same region, completely overlapping
✓ Fat Hairy Caterpillar: Random, noisy appearance - looks like dense hair
✓ Forgetful: Can't distinguish where each chain started
Example Description: "Chains quickly converge and mix, appearing as an indistinguishable fuzzy band across iterations."
Bad Traces
Warning signs of non-convergence:
✗ Separated Chains: Chains exploring different regions without overlap
✗ Trends: Chains drifting up or down over iterations
✗ Slow Mixing: Chains changing slowly, showing long-term patterns
✗ Stuck Chains: One or more chains not moving or stuck in region
✗ Late Convergence: Chains only converging near the end
Example Description: "Chains remain separated throughout, one exploring higher values than others."
Taking Action
If Most Traces Look Good: A few problematic parameters often improve with more draws. Focus on fixing just those parameters.
If Many Traces Look Bad: Fundamental issue with model or settings. Review model specification, increase tuning, or simplify model.
Best Practices
Always Check Diagnostics: Never interpret Bayesian results without first reviewing convergence diagnostics. This is non-negotiable.
Use Multiple Diagnostics: Don't rely on just R-hat. Consider ESS, divergences, and visual inspection together.
Set Clear Thresholds: Establish R-hat < 1.01 and ESS > 400 as your standards. Don't compromise without documented justification.
Start Conservative: Use standard or high-quality MCMC settings from the start. It's faster to sample well once than to iterate multiple times.
Investigate, Don't Ignore: If diagnostics flag a parameter, understand why before proceeding. Don't just rerun hoping it improves.
Document Issues: Record any convergence problems and how you resolved them. This helps with reproducibility and learning.
Visual Confirmation: Even with good numeric diagnostics, always glance at trace plots for visual confirmation.
Rerun When Needed: Don't be afraid to rerun models. It's better to spend an extra 5 minutes sampling than to make business decisions on unreliable estimates.
Common Misconceptions
Myth 1: "If R-hat < 1.05, results are reliable" Reality: While 1.05 is often cited, aim for R-hat < 1.01 for high-confidence results. Use 1.05 only as a bare minimum.
Myth 2: "More samples always improve convergence" Reality: More samples help if chains are already converging. If chains are stuck or diverging, more samples won't fix fundamental issues.
Myth 3: "Divergences just mean the model is complex" Reality: While complexity can cause divergences, they always indicate the posterior isn't being fully explored. Address them, don't ignore them.
Myth 4: "Low ESS means the estimates are wrong" Reality: Low ESS means less precision, not necessarily bias. Increase draws to improve ESS while maintaining accurate central estimates.
Myth 5: "Convergence diagnostics are optional for simple models" Reality: Even simple models can have convergence issues. Always check diagnostics regardless of model complexity.
Advanced Diagnostics
Monte Carlo Standard Error (MCSE)
Standard error of the posterior mean estimate due to using finite MCMC samples.
Rule of Thumb: MCSE should be < 5% of posterior standard deviation
If Too Large: Increase draws to reduce Monte Carlo error
Autocorrelation
Correlation between samples at different lags.
Ideal: Rapid decay to zero within 10-20 lags Problem: High autocorrelation persisting across many lags
Last updated