OLS vs Bayesian Selection

Understanding and Switching Between Modeling Approaches

OLS vs Bayesian Selection

Understanding and Switching Between Modeling Approaches

Overview

MixModeler supports two statistical approaches: OLS (Ordinary Least Squares) and Bayesian modeling. You can switch between them anytime, and both use the same variables and model structure. The difference lies in how coefficients are estimated and what additional information you get.

Key Concept: Same model specification, different statistical frameworks

The Two Approaches

OLS (Ordinary Least Squares)

Statistical Framework: Frequentist

What it does:

  • Estimates single "best" coefficient for each variable

  • Minimizes sum of squared residuals

  • Provides point estimates only

  • Classical regression approach

Output:

  • Coefficient (β)

  • Standard error

  • T-statistic

  • P-value

  • 95% confidence interval

Computation:

  • Fast (milliseconds)

  • Deterministic (same result every time)

  • No sampling needed

When to use:

  • Default starting point

  • Exploratory analysis

  • Quick iterations

  • When you don't have prior knowledge

  • Stakeholders expect traditional statistics

Bayesian Modeling

Statistical Framework: Bayesian

What it does:

  • Incorporates prior beliefs about coefficients

  • Uses MCMC sampling to estimate posterior distributions

  • Provides full probability distributions

  • Quantifies uncertainty

Output:

  • Posterior mean (similar to coefficient)

  • Posterior standard deviation

  • 95% credible interval

  • Full posterior distribution

  • R-hat (convergence diagnostic)

  • Effective sample size

Computation:

  • Slower (seconds to minutes)

  • Stochastic (small variations each run)

  • Requires MCMC sampling

When to use:

  • You have expert priors

  • Need uncertainty quantification

  • Want probabilistic statements

  • Final production models

  • Stakeholders understand Bayesian inference

Switching Between OLS and Bayesian

How to Switch

In Model Library:

  1. Locate model in table

  2. Click the Type button (shows "OLS" or "BAY")

  3. Type toggles immediately

  4. All pages update to reflect new type

In Model Builder:

  1. Use the model type toggle at top

  2. Interface updates immediately

  3. Statistics shown change based on type

Effect of switching:

  • Model specification unchanged (same variables)

  • Results recalculated in new framework

  • Interface adapts to show relevant statistics

  • All other pages (Diagnostics, Decomposition) use selected type

What Changes When You Switch

Interface changes:

OLS mode shows:

  • Coefficient Type (Floating/Fixed)

  • Coefficient Value input

  • Standard error

  • T-statistic

  • P-value

Bayesian mode shows:

  • Prior Distribution dropdown

  • Prior Mean input

  • Prior Std Dev input

  • Posterior Mean

  • Posterior Std Dev

Statistics reported:

OLS:

  • Point estimates

  • Frequentist confidence intervals

  • P-values

  • F-statistics

Bayesian:

  • Posterior distributions

  • Credible intervals

  • Posterior probabilities

  • WAIC, LOO (model comparison metrics)

OLS Mode Details

Coefficient Estimation

How it works:

  1. Minimizes sum of squared errors

  2. Solves normal equations

  3. Returns single best estimate per variable

Assumptions:

  • Linear relationship

  • Normally distributed errors

  • Homoscedastic errors

  • Independent observations

  • No perfect multicollinearity

Advantages:

  • Fast computation

  • Familiar to stakeholders

  • Standard in econometrics

  • Easy interpretation

Limitations:

  • No uncertainty quantification beyond std error

  • Sensitive to outliers

  • Can't incorporate prior knowledge

  • Point estimates only

Fixed vs Floating Coefficients

Floating (Default):

  • Coefficient estimated by regression

  • Model determines best value

  • Normal use case

Fixed:

  • You specify exact coefficient value

  • Useful for sensitivity analysis

  • Advanced feature

  • Example: "What if TV coefficient was exactly 1000?"

How to use:

  1. Select "Fixed" in Coefficient Type

  2. Enter desired value

  3. Add variable

  4. Regression estimates others, given fixed value

Interpreting OLS Results

Coefficient:

  • Units: Change in KPI per unit change in variable

  • Sign: Positive (increases KPI) or Negative (decreases KPI)

  • Magnitude: Strength of relationship

T-statistic:

  • Measures how many standard errors coefficient is from zero

  • |t| > 1.96: Significant at 95% confidence

  • |t| > 2.58: Significant at 99% confidence

  • Target: |t| > 2.0

P-value:

  • Probability of observing coefficient if true value is zero

  • < 0.05: Significant

  • < 0.01: Highly significant

  • > 0.10: Not significant, consider removing

Confidence Interval:

  • Range where true coefficient likely falls

  • Narrow interval: Precise estimate

  • Wide interval: Uncertain estimate

  • Excludes zero: Variable is significant

Bayesian Mode Details

Prior Distributions

What are priors: Your belief about coefficient value BEFORE seeing the data

Why use priors:

  • Incorporate expert knowledge

  • Regularize estimates (prevent overfitting)

  • Handle collinearity better

  • More realistic in small samples

Available distributions:

Normal (Default):

  • Symmetric around mean

  • Most common choice

  • Parameters: mean, std dev

  • Use when: No strong directional belief

Student-t:

  • Heavier tails than Normal

  • More robust to outliers

  • Parameters: mean, std dev, degrees of freedom

  • Use when: Expect occasional extreme values

Laplace (Double Exponential):

  • Sharper peak, heavier tails

  • Promotes sparsity

  • Parameters: mean, scale

  • Use when: Some coefficients should be near zero

Horseshoe:

  • Strong sparsity inducing

  • Shrinks small coefficients to zero

  • Keeps large coefficients

  • Use when: Many variables, few truly important

Uniform:

  • All values in range equally likely

  • Non-informative

  • Parameters: lower bound, upper bound

  • Use when: Know range but nothing else

Half-Normal (Positive Only):

  • Only positive values allowed

  • Normal distribution truncated at zero

  • Use when: Coefficient must be positive (e.g., marketing spend effect)

Exponential (Positive/Negative):

  • Decaying probability

  • Favors values near zero

  • Use when: Small effects expected

Gamma/Inverse Gamma:

  • Positive values only

  • Flexible shapes

  • Use when: Positive coefficients, specific shape needed

Setting Prior Parameters

Prior Mean:

  • Your best guess for coefficient value

  • Example: "I think TV coefficient is around 500"

  • Set to 0 if no strong belief

Prior Std Dev:

  • Your uncertainty about the mean

  • Small std dev (e.g., 50): Strong belief, narrow prior

  • Large std dev (e.g., 1000): Weak belief, diffuse prior

  • Very large (e.g., 10000): Nearly non-informative

Common approaches:

Weakly informative (recommended default):

  • Prior mean = 0

  • Prior std dev = 1000

  • Allows data to dominate

  • Mild regularization

Informative:

  • Prior mean = expert estimate

  • Prior std dev = reasonable uncertainty

  • Use when you have strong domain knowledge

  • Example: Mean=500, Std=200 for TV based on previous studies

Sign constraints:

  • Use Half-Normal for positive-only

  • Use Exponential negative for negative-only

  • Prevents nonsensical estimates

Running Bayesian Inference

Important: Switching to Bayesian mode doesn't automatically run inference

Process:

  1. Switch model to Bayesian in Model Library

  2. Configure priors for variables in Model Builder

  3. Navigate to Bayesian Model Interface

  4. Click "Run Inference"

  5. Wait for MCMC sampling (30 seconds to 5 minutes)

  6. Review convergence diagnostics

  7. Results now available throughout MixModeler

MCMC Settings:

  • Chains: 4 (default, recommended)

  • Iterations: 2000 (default)

  • Warmup: 1000 (discarded)

  • Thinning: 1 (keep every sample)

Interpreting Bayesian Results

Posterior Mean:

  • Average of posterior distribution

  • Similar interpretation to OLS coefficient

  • "Best estimate" given data and priors

Posterior Std Dev:

  • Uncertainty in coefficient estimate

  • Similar to standard error in OLS

  • Smaller = more certain

95% Credible Interval:

  • Interpretation: "95% probability true coefficient is in this range"

  • Different from confidence interval (frequentist concept)

  • Excludes zero: Strong evidence variable matters

R-hat (Gelman-Rubin):

  • Convergence diagnostic

  • < 1.01: Excellent convergence

  • < 1.05: Acceptable convergence

  • > 1.10: Poor convergence, rerun with more iterations

Effective Sample Size (ESS):

  • Number of independent samples

  • > 1000: Good

  • > 400: Acceptable

  • < 100: Poor, rerun with more iterations

Posterior Probability:

  • P(coefficient > 0) for positive effect

  • P(coefficient < 0) for negative effect

  • > 95%: Strong evidence

  • > 99%: Very strong evidence

Comparison Table

Aspect
OLS
Bayesian

Speed

Fast (milliseconds)

Slower (seconds to minutes)

Output

Point estimates

Full distributions

Uncertainty

Confidence intervals

Credible intervals

Priors

None

Incorporated

Interpretation

Coefficients, p-values

Posterior probabilities

Computation

Deterministic

Stochastic (MCMC)

Small samples

Can be unstable

More robust with priors

Multicollinearity

Problematic

Better handling with priors

Default choice

Yes

No (requires more setup)

Stakeholder familiarity

High

Low to moderate

When to Use Each

Use OLS When:

Starting model development

  • Quick iterations needed

  • Exploring variable combinations

  • Testing hypotheses rapidly

Simple models

  • Few variables

  • Large sample size

  • Low multicollinearity

Stakeholder requirements

  • Expect traditional statistics

  • Unfamiliar with Bayesian methods

  • P-values and t-stats are standard

No prior knowledge

  • First time modeling this problem

  • No historical data or expert input

  • Want data to speak for itself

Use Bayesian When:

You have prior knowledge

  • Historical models

  • Expert domain knowledge

  • Theoretical constraints (e.g., positive effects)

Need uncertainty quantification

  • Risk assessment required

  • Confidence bounds for forecasts

  • Probabilistic statements needed

Complex models

  • Many variables

  • High multicollinearity

  • Small sample size (priors provide regularization)

Final production models

  • After exploratory OLS phase

  • For optimization and decision-making

  • When full uncertainty assessment valuable

Workflow: OLS to Bayesian

Recommended approach for most projects:

Phase 1: OLS Exploration (Days 1-2)

  1. Build models with OLS

  2. Test variable combinations rapidly

  3. Find best specification

  4. Identify stable, significant variables

  5. Final OLS model: R² = 78%, all variables significant

Phase 2: Bayesian Refinement (Day 3)

  1. Switch final model to Bayesian

  2. Set weakly informative priors (mean=0, std=1000)

  3. Add sign constraints for marketing (Half-Normal positive)

  4. Run Bayesian inference

  5. Check convergence (R-hat < 1.01)

Phase 3: Bayesian Analysis (Day 4)

  1. Review posterior distributions

  2. Calculate posterior probabilities

  3. Generate uncertainty-aware forecasts

  4. Run optimization with uncertainty

  5. Present results with credible intervals

Benefits of this workflow:

  • Fast exploration with OLS

  • Robust final estimates with Bayesian

  • Best of both worlds

  • Stakeholder-friendly progression

Common Questions

Can I switch back and forth?

Yes! Switch anytime without losing work.

  • OLS results stored separately

  • Bayesian results stored separately

  • Switch to compare approaches

  • No data loss

Do I need to rerun inference when I switch?

Switching TO Bayesian: Yes, run inference in Bayesian Model Interface

Switching TO OLS: No, OLS results already available

After adding/removing variables: Yes, rerun inference (Bayesian) or model refits automatically (OLS)

Will my model export with both results?

Yes, if you've run both:

  • Export captures current model type

  • Both OLS and Bayesian results included in Excel

  • Can reimport and switch between them

Which is "better"?

No universal answer. Depends on:

OLS advantages:

  • Faster

  • Simpler

  • More familiar

  • Standard in industry

Bayesian advantages:

  • More flexible

  • Better uncertainty quantification

  • Handles complexity better

  • More theoretically principled

Practical answer: Start with OLS, switch to Bayesian for final model if needed

Troubleshooting

"Bayesian feature not available"

Cause: Free tier doesn't include Bayesian

Solution:

  • Upgrade to Professional or Enterprise

  • Click upgrade link in dialog

  • Or continue with OLS

Bayesian results missing after switch

Cause: Haven't run MCMC inference yet

Solution:

  1. Navigate to Bayesian Model Interface

  2. Click "Run Inference"

  3. Wait for completion

  4. Results now available

OLS and Bayesian give very different results

Possible causes:

  • Strong priors pulling estimates

  • Convergence issues in Bayesian

  • Small sample size

Solutions:

  • Check R-hat (should be < 1.05)

  • Use weaker priors (larger std dev)

  • Increase MCMC iterations

  • Compare with prior predictive checks

Can't switch to Bayesian

Cause: Model has fixed coefficients (OLS feature)

Solution:

  • Remove fixed coefficients

  • Set all to "Floating"

  • Then switch to Bayesian

Key Takeaways

  • OLS is faster and simpler - great for exploration

  • Bayesian provides uncertainty quantification and prior incorporation

  • Switch anytime without losing work

  • Recommended workflow: OLS exploration → Bayesian refinement

  • Must run MCMC inference after switching to Bayesian

  • Both use same model specification (variables)

  • Export includes results from both methods if run

  • Choose based on project needs and stakeholder requirements

Last updated