Convergence Diagnostics
Overview
Convergence diagnostics assess whether MCMC sampling has successfully explored the posterior distribution. Before trusting Bayesian results, you must verify that chains have converged to a stable distribution. Poor convergence means unreliable estimates and invalid credible intervals.
MixModeler automatically calculates comprehensive diagnostics and provides clear guidance on whether results are trustworthy.
Why Convergence Matters
The Problem
MCMC chains start at random locations and gradually explore the parameter space. Early samples may not represent the true posterior distribution. Convergence occurs when chains:
- Stabilize at the correct distribution 
- Forget their starting positions 
- Produce consistent estimates regardless of initialization 
The Risk
Using non-converged results leads to:
- Incorrect coefficient estimates 
- Invalid credible intervals 
- Wrong business decisions 
- Misleading uncertainty quantification 
The Solution
Convergence diagnostics detect these issues before you interpret results, ensuring your Bayesian inference is reliable.
Key Diagnostic Metrics
R-hat (Gelman-Rubin Statistic)
The primary convergence diagnostic comparing between-chain and within-chain variance.
What It Measures: Whether multiple chains have converged to the same distribution
Interpretation:
- R-hat ≈ 1.00: Perfect convergence, chains agree 
- R-hat < 1.01: Excellent convergence, safe to proceed 
- R-hat < 1.05: Acceptable convergence, proceed with caution 
- R-hat > 1.05: Poor convergence, do not trust results 
How It Works: Compares variance between chains to variance within chains. If chains haven't converged, between-chain variance will be larger than within-chain variance.
Rule of Thumb: R-hat must be < 1.01 for all parameters before interpreting results.
Effective Sample Size (ESS)
Number of independent samples providing equivalent information to your correlated MCMC samples.
What It Measures: How much independent information your samples contain after accounting for autocorrelation
Types:
- ESS Bulk: Effective sample size for the central distribution (mean, median) 
- ESS Tail: Effective sample size for the distribution tails (quantiles, extremes) 
Interpretation:
- ESS > 400: Sufficient for reliable estimates 
- ESS 200-400: Adequate but consider more samples 
- ESS 100-200: Minimal, increase draws 
- ESS < 100: Insufficient, results unreliable 
Total vs Effective Samples: You might have 8,000 total samples (4 chains × 2,000 draws) but ESS of 800 due to autocorrelation. This means your samples contain equivalent information to 800 independent draws.
Rule of Thumb: ESS > 400 for both bulk and tail ensures reliable credible intervals and probability estimates.
ESS Percentage
ESS as a percentage of total samples.
Calculation: (ESS / Total Samples) × 100
Interpretation:
- >50%: Excellent mixing, very efficient sampling 
- 25-50%: Good mixing, efficient sampling 
- 10-25%: Moderate mixing, acceptable 
- <10%: Poor mixing, samples highly autocorrelated 
Ideal: ESS percentage above 25% indicates efficient exploration without excessive autocorrelation.
Divergent Transitions
Failed sampling steps where the algorithm couldn't accurately explore the posterior.
What It Measures: Regions of the posterior that are difficult to explore, often high-curvature areas
Interpretation:
- 0 divergences: Perfect, no exploration failures 
- 1-10 divergences (<1% of samples): Minimal, usually acceptable 
- 10-100 divergences (1-10%): Concerning, investigate 
- >100 divergences (>10%): Severe problem, results unreliable 
Causes:
- Complex posterior geometries 
- Highly correlated parameters 
- Poor model specification 
- Conflicting priors and data 
Rule of Thumb: Aim for zero divergences. Any divergences warrant investigation, though <1% is often acceptable.
Accessing Diagnostics in MixModeler
Automatic Calculation
After Bayesian model runs, MixModeler automatically:
- Calculates all convergence diagnostics 
- Flags any parameters with issues 
- Provides an overall convergence assessment 
- Offers specific recommendations 
Diagnostics Panel
To view detailed diagnostics:
- Open your Bayesian model results 
- Click Convergence Diagnostics tab 
- Review overall assessment banner 
- Examine parameter-by-parameter details 
Overall Assessment
MixModeler provides a summary judgment:
Excellent: All diagnostics pass strict thresholds
- R-hat < 1.01 for all parameters 
- ESS > 400 for all parameters 
- Zero or negligible divergences 
- Action: Safe to interpret and use results 
Good: Diagnostics meet acceptable thresholds
- R-hat < 1.05 for all parameters 
- ESS > 200 for most parameters 
- Few divergences (<1%) 
- Action: Proceed but note any flagged parameters 
Acceptable: Some diagnostic concerns
- R-hat < 1.1 for most parameters 
- ESS > 100 for most parameters 
- Moderate divergences (1-5%) 
- Action: Interpret cautiously, consider rerunning with adjusted settings 
Poor: Serious convergence issues
- R-hat > 1.1 for some parameters 
- ESS < 100 for some parameters 
- Many divergences (>5%) 
- Action: Do not trust results, must rerun with different settings 
Parameter-Level Diagnostics
Diagnostic Table
For each model parameter, view:
TV_Advertising
1.00
1,250
1,180
15.6%
✓ Pass
Digital_Marketing
1.01
980
920
12.3%
✓ Pass
Print_Media
1.08
320
280
4.0%
⚠ Warning
Radio
1.00
1,450
1,380
18.1%
✓ Pass
Interpretation: Print_Media shows poor convergence (high R-hat, low ESS), while other parameters converged well.
Flagged Parameters
Parameters failing diagnostic thresholds are automatically highlighted:
Red Flag (Critical): R-hat > 1.05 or ESS < 200
- Do not interpret this parameter 
- Rerun model with adjusted settings 
Yellow Flag (Warning): R-hat 1.01-1.05 or ESS 200-400
- Interpret cautiously 
- Consider increasing draws 
- Check for model specification issues 
Visual Diagnostics
Trace Plots: Show parameter values over MCMC iterations for each chain
Good Trace:
- Chains overlap completely 
- Random scatter around mean (white noise) 
- No trends or patterns 
- Chains indistinguishable from each other 
Bad Trace:
- Chains separated or drifting 
- Trends up or down 
- Chains don't overlap 
- One chain exploring different region than others 
Troubleshooting Non-Convergence
Issue 1: High R-hat Values
Symptom: R-hat > 1.05 for some parameters
Diagnosis: Chains haven't converged to same distribution
Solutions (in order of preference):
- Increase draws: Double from 2,000 to 4,000 - Allows more time for convergence 
- Often solves mild convergence issues 
 
- Increase tuning: Increase from 1,000 to 2,000 - Better step size adaptation 
- Improves exploration efficiency 
 
- Add more chains: Increase from 4 to 6 or 8 - More chains make R-hat more sensitive 
- Better diagnosis of convergence 
 
- Check model specification: - Remove highly correlated variables 
- Simplify adstock transformations 
- Review prior specifications 
 
Example Fix:
Before: Chains=4, Draws=2,000, R-hat=1.08
After: Chains=4, Draws=4,000, R-hat=1.00Issue 2: Low Effective Sample Size
Symptom: ESS < 400 (especially ESS < 200)
Diagnosis: High autocorrelation in samples, inefficient exploration
Solutions:
- Increase draws: More samples compensates for correlation - Double draws from 2,000 to 4,000 
- ESS typically increases proportionally 
 
- Increase tuning: Better adaptation reduces autocorrelation - Increase from 1,000 to 1,500+ 
- Allows algorithm to learn better step sizes 
 
- Simplify model: Remove redundant variables - Highly correlated predictors increase autocorrelation 
- Simpler models mix better 
 
- Check priors: Very strong priors can restrict movement - Loosen prior standard deviations 
- Ensure priors don't conflict with data 
 
Example Fix:
Before: Draws=2,000, ESS Bulk=180
After: Draws=4,000, ESS Bulk=420Issue 3: Divergent Transitions
Symptom: Warning about divergences, diagnostic shows >10 divergences
Diagnosis: Sampler struggling with posterior geometry
Solutions:
- Increase target_accept: From 0.95 to 0.98 or 0.99 - Takes smaller, more careful steps 
- Better handles complex geometries 
- Most effective solution for divergences 
 
- Increase tuning: From 1,000 to 2,000 - More time to adapt to posterior shape 
- Often reduces divergences significantly 
 
- Check prior-data conflicts: - Review if strong priors conflict with data 
- Loosen priors if appropriate 
- Ensure priors are on correct scale 
 
- Reparameterize model: - Remove highly correlated variables 
- Use standardized variables 
- Simplify transformations 
 
Example Fix:
Before: Target Accept=0.95, Divergences=45
After: Target Accept=0.99, Divergences=0Issue 4: Chains Haven't Mixed
Symptom: Trace plots show chains in different regions
Diagnosis: Chains stuck in different modes or starting positions
Solutions:
- Increase draws dramatically: 2,000 → 5,000+ draws - Give chains time to find and converge to correct distribution 
 
- Increase tuning: 1,000 → 2,000+ steps - More adaptation time helps chains escape local regions 
 
- Check for multimodality: - Posterior might have multiple modes 
- Review model specification 
- May indicate identification problems 
 
- Verify priors are reasonable: - Check if priors push chains to different regions 
- Ensure priors don't create artificial modes 
 
Issue 5: All Parameters Show Issues
Symptom: Everything has high R-hat and low ESS
Diagnosis: Fundamental model or setting problem
Solutions:
- Start fresh with simpler model: - Begin with fewer variables 
- Remove transformations temporarily 
- Use default priors 
 
- Check data quality: - Verify no missing values or NaNs 
- Ensure reasonable variable scales 
- Look for extreme outliers 
 
- Review MCMC settings: - Ensure settings weren't accidentally reduced 
- Verify target_accept is at least 0.90 
- Confirm draws and tuning are adequate 
 
- Try Fast Inference mode first: - Run with SVI to see if model structure is viable 
- If SVI fails, likely a fundamental model issue 
 
Interpreting Trace Plots
Reading Trace Plots
Trace plots show parameter values across MCMC iterations for each chain (typically color-coded).
Horizontal Axis: MCMC iteration number Vertical Axis: Parameter value Colors: Different chains
Good Traces
Characteristics of well-converged chains:
✓ Stationary: No trends, drifts, or patterns - just random fluctuation around a stable mean
✓ Overlapping: All chains explore the same region, completely overlapping
✓ Fat Hairy Caterpillar: Random, noisy appearance - looks like dense hair
✓ Forgetful: Can't distinguish where each chain started
Example Description: "Chains quickly converge and mix, appearing as an indistinguishable fuzzy band across iterations."
Bad Traces
Warning signs of non-convergence:
✗ Separated Chains: Chains exploring different regions without overlap
✗ Trends: Chains drifting up or down over iterations
✗ Slow Mixing: Chains changing slowly, showing long-term patterns
✗ Stuck Chains: One or more chains not moving or stuck in region
✗ Late Convergence: Chains only converging near the end
Example Description: "Chains remain separated throughout, one exploring higher values than others."
Taking Action
If Most Traces Look Good: A few problematic parameters often improve with more draws. Focus on fixing just those parameters.
If Many Traces Look Bad: Fundamental issue with model or settings. Review model specification, increase tuning, or simplify model.
Best Practices
Always Check Diagnostics: Never interpret Bayesian results without first reviewing convergence diagnostics. This is non-negotiable.
Use Multiple Diagnostics: Don't rely on just R-hat. Consider ESS, divergences, and visual inspection together.
Set Clear Thresholds: Establish R-hat < 1.01 and ESS > 400 as your standards. Don't compromise without documented justification.
Start Conservative: Use standard or high-quality MCMC settings from the start. It's faster to sample well once than to iterate multiple times.
Investigate, Don't Ignore: If diagnostics flag a parameter, understand why before proceeding. Don't just rerun hoping it improves.
Document Issues: Record any convergence problems and how you resolved them. This helps with reproducibility and learning.
Visual Confirmation: Even with good numeric diagnostics, always glance at trace plots for visual confirmation.
Rerun When Needed: Don't be afraid to rerun models. It's better to spend an extra 5 minutes sampling than to make business decisions on unreliable estimates.
Common Misconceptions
Myth 1: "If R-hat < 1.05, results are reliable" Reality: While 1.05 is often cited, aim for R-hat < 1.01 for high-confidence results. Use 1.05 only as a bare minimum.
Myth 2: "More samples always improve convergence" Reality: More samples help if chains are already converging. If chains are stuck or diverging, more samples won't fix fundamental issues.
Myth 3: "Divergences just mean the model is complex" Reality: While complexity can cause divergences, they always indicate the posterior isn't being fully explored. Address them, don't ignore them.
Myth 4: "Low ESS means the estimates are wrong" Reality: Low ESS means less precision, not necessarily bias. Increase draws to improve ESS while maintaining accurate central estimates.
Myth 5: "Convergence diagnostics are optional for simple models" Reality: Even simple models can have convergence issues. Always check diagnostics regardless of model complexity.
Advanced Diagnostics
Monte Carlo Standard Error (MCSE)
Standard error of the posterior mean estimate due to using finite MCMC samples.
Rule of Thumb: MCSE should be < 5% of posterior standard deviation
If Too Large: Increase draws to reduce Monte Carlo error
Autocorrelation
Correlation between samples at different lags.
Ideal: Rapid decay to zero within 10-20 lags Problem: High autocorrelation persisting across many lags
Last updated