OLS vs Bayesian Selection

Understanding and Switching Between Modeling Approaches

OLS vs Bayesian Selection

Understanding and Switching Between Modeling Approaches

Overview

MixModeler supports two statistical approaches: OLS (Ordinary Least Squares) and Bayesian modeling. You can switch between them anytime, and both use the same variables and model structure. The difference lies in how coefficients are estimated and what additional information you get.

Key Concept: Same model specification, different statistical frameworks

The Two Approaches

OLS (Ordinary Least Squares)

Statistical Framework: Frequentist

What it does:

Estimates single "best" coefficient for each variable
Minimizes sum of squared residuals
Provides point estimates only
Classical regression approach

Output:

Coefficient (β)
Standard error
T-statistic
P-value
95% confidence interval
R²

Computation:

Fast (milliseconds)
Deterministic (same result every time)
No sampling needed

When to use:

Default starting point
Exploratory analysis
Quick iterations
When you don't have prior knowledge
Stakeholders expect traditional statistics

Bayesian Modeling

Statistical Framework: Bayesian

What it does:

Incorporates prior beliefs about coefficients
Uses MCMC sampling to estimate posterior distributions
Provides full probability distributions
Quantifies uncertainty

Output:

Posterior mean (similar to coefficient)
Posterior standard deviation
95% credible interval
Full posterior distribution
R-hat (convergence diagnostic)
Effective sample size

Computation:

Slower (seconds to minutes)
Stochastic (small variations each run)
Requires MCMC sampling

When to use:

You have expert priors
Need uncertainty quantification
Want probabilistic statements
Final production models
Stakeholders understand Bayesian inference

Switching Between OLS and Bayesian

How to Switch

In Model Library:

Locate model in table
Click the Type button (shows "OLS" or "BAY")
Type toggles immediately
All pages update to reflect new type

In Model Builder:

Use the model type toggle at top
Interface updates immediately
Statistics shown change based on type

Effect of switching:

Model specification unchanged (same variables)
Results recalculated in new framework
Interface adapts to show relevant statistics
All other pages (Diagnostics, Decomposition) use selected type

What Changes When You Switch

Interface changes:

OLS mode shows:

Coefficient Type (Floating/Fixed)
Coefficient Value input
Standard error
T-statistic
P-value

Bayesian mode shows:

Prior Distribution dropdown
Prior Mean input
Prior Std Dev input
Posterior Mean
Posterior Std Dev

Statistics reported:

OLS:

Point estimates
Frequentist confidence intervals
P-values
F-statistics

Bayesian:

Posterior distributions
Credible intervals
Posterior probabilities
WAIC, LOO (model comparison metrics)

OLS Mode Details

Coefficient Estimation

How it works:

Minimizes sum of squared errors
Solves normal equations
Returns single best estimate per variable

Assumptions:

Linear relationship
Normally distributed errors
Homoscedastic errors
Independent observations
No perfect multicollinearity

Advantages:

Fast computation
Familiar to stakeholders
Standard in econometrics
Easy interpretation

Limitations:

No uncertainty quantification beyond std error
Sensitive to outliers
Can't incorporate prior knowledge
Point estimates only

Fixed vs Floating Coefficients

Floating (Default):

Coefficient estimated by regression
Model determines best value
Normal use case

Fixed:

You specify exact coefficient value
Useful for sensitivity analysis
Advanced feature
Example: "What if TV coefficient was exactly 1000?"

How to use:

Select "Fixed" in Coefficient Type
Enter desired value
Add variable
Regression estimates others, given fixed value

Interpreting OLS Results

Coefficient:

Units: Change in KPI per unit change in variable
Sign: Positive (increases KPI) or Negative (decreases KPI)
Magnitude: Strength of relationship

T-statistic:

Measures how many standard errors coefficient is from zero
|t| > 1.96: Significant at 95% confidence
|t| > 2.58: Significant at 99% confidence
Target: |t| > 2.0

P-value:

Probability of observing coefficient if true value is zero
< 0.05: Significant
< 0.01: Highly significant
> 0.10: Not significant, consider removing

Confidence Interval:

Range where true coefficient likely falls
Narrow interval: Precise estimate
Wide interval: Uncertain estimate
Excludes zero: Variable is significant

Bayesian Mode Details

Prior Distributions

What are priors: Your belief about coefficient value BEFORE seeing the data

Why use priors:

Incorporate expert knowledge
Regularize estimates (prevent overfitting)
Handle collinearity better
More realistic in small samples

Available distributions:

Normal (Default):

Symmetric around mean
Most common choice
Parameters: mean, std dev
Use when: No strong directional belief

Student-t:

Heavier tails than Normal
More robust to outliers
Parameters: mean, std dev, degrees of freedom
Use when: Expect occasional extreme values

Laplace (Double Exponential):

Sharper peak, heavier tails
Promotes sparsity
Parameters: mean, scale
Use when: Some coefficients should be near zero

Horseshoe:

Strong sparsity inducing
Shrinks small coefficients to zero
Keeps large coefficients
Use when: Many variables, few truly important

Uniform:

All values in range equally likely
Non-informative
Parameters: lower bound, upper bound
Use when: Know range but nothing else

Half-Normal (Positive Only):

Only positive values allowed
Normal distribution truncated at zero
Use when: Coefficient must be positive (e.g., marketing spend effect)

Exponential (Positive/Negative):

Decaying probability
Favors values near zero
Use when: Small effects expected

Gamma/Inverse Gamma:

Positive values only
Flexible shapes
Use when: Positive coefficients, specific shape needed

Setting Prior Parameters

Prior Mean:

Your best guess for coefficient value
Example: "I think TV coefficient is around 500"
Set to 0 if no strong belief

Prior Std Dev:

Your uncertainty about the mean
Small std dev (e.g., 50): Strong belief, narrow prior
Large std dev (e.g., 1000): Weak belief, diffuse prior
Very large (e.g., 10000): Nearly non-informative

Common approaches:

Weakly informative (recommended default):

Prior mean = 0
Prior std dev = 1000
Allows data to dominate
Mild regularization

Informative:

Prior mean = expert estimate
Prior std dev = reasonable uncertainty
Use when you have strong domain knowledge
Example: Mean=500, Std=200 for TV based on previous studies

Sign constraints:

Use Half-Normal for positive-only
Use Exponential negative for negative-only
Prevents nonsensical estimates

Running Bayesian Inference

Important: Switching to Bayesian mode doesn't automatically run inference

Process:

Switch model to Bayesian in Model Library
Configure priors for variables in Model Builder
Navigate to Bayesian Model Interface
Click "Run Inference"
Wait for MCMC sampling (30 seconds to 5 minutes)
Review convergence diagnostics
Results now available throughout MixModeler

MCMC Settings:

Chains: 4 (default, recommended)
Iterations: 2000 (default)
Warmup: 1000 (discarded)
Thinning: 1 (keep every sample)

Interpreting Bayesian Results

Posterior Mean:

Average of posterior distribution
Similar interpretation to OLS coefficient
"Best estimate" given data and priors

Posterior Std Dev:

Uncertainty in coefficient estimate
Similar to standard error in OLS
Smaller = more certain

95% Credible Interval:

Interpretation: "95% probability true coefficient is in this range"
Different from confidence interval (frequentist concept)
Excludes zero: Strong evidence variable matters

R-hat (Gelman-Rubin):

Convergence diagnostic
< 1.01: Excellent convergence
< 1.05: Acceptable convergence
> 1.10: Poor convergence, rerun with more iterations

Effective Sample Size (ESS):

Number of independent samples
> 1000: Good
> 400: Acceptable
< 100: Poor, rerun with more iterations

Posterior Probability:

P(coefficient > 0) for positive effect
P(coefficient < 0) for negative effect
> 95%: Strong evidence
> 99%: Very strong evidence

Comparison Table

Aspect

OLS

Bayesian

Speed

Fast (milliseconds)

Slower (seconds to minutes)

Output

Point estimates

Full distributions

Uncertainty

Confidence intervals

Credible intervals

Priors

None

Incorporated

Interpretation

Coefficients, p-values

Posterior probabilities

Computation

Deterministic

Stochastic (MCMC)

Small samples

Can be unstable

More robust with priors

Multicollinearity

Problematic

Better handling with priors

Default choice

Yes

No (requires more setup)

Stakeholder familiarity

High

Low to moderate

When to Use Each

Use OLS When:

✅ Starting model development

Quick iterations needed
Exploring variable combinations
Testing hypotheses rapidly

✅ Simple models

Few variables
Large sample size
Low multicollinearity

✅ Stakeholder requirements

Expect traditional statistics
Unfamiliar with Bayesian methods
P-values and t-stats are standard

✅ No prior knowledge

First time modeling this problem
No historical data or expert input
Want data to speak for itself

Use Bayesian When:

✅ You have prior knowledge

Historical models
Expert domain knowledge
Theoretical constraints (e.g., positive effects)

✅ Need uncertainty quantification

Risk assessment required
Confidence bounds for forecasts
Probabilistic statements needed

✅ Complex models

Many variables
High multicollinearity
Small sample size (priors provide regularization)

✅ Final production models

After exploratory OLS phase
For optimization and decision-making
When full uncertainty assessment valuable

Workflow: OLS to Bayesian

Recommended approach for most projects:

Phase 1: OLS Exploration (Days 1-2)

Build models with OLS
Test variable combinations rapidly
Find best specification
Identify stable, significant variables
Final OLS model: R² = 78%, all variables significant

Phase 2: Bayesian Refinement (Day 3)

Switch final model to Bayesian
Set weakly informative priors (mean=0, std=1000)
Add sign constraints for marketing (Half-Normal positive)
Run Bayesian inference
Check convergence (R-hat < 1.01)

Phase 3: Bayesian Analysis (Day 4)

Review posterior distributions
Calculate posterior probabilities
Generate uncertainty-aware forecasts
Run optimization with uncertainty
Present results with credible intervals

Benefits of this workflow:

Fast exploration with OLS
Robust final estimates with Bayesian
Best of both worlds
Stakeholder-friendly progression

Common Questions

Can I switch back and forth?

Yes! Switch anytime without losing work.

OLS results stored separately
Bayesian results stored separately
Switch to compare approaches
No data loss

Do I need to rerun inference when I switch?

Switching TO Bayesian: Yes, run inference in Bayesian Model Interface

Switching TO OLS: No, OLS results already available

After adding/removing variables: Yes, rerun inference (Bayesian) or model refits automatically (OLS)

Will my model export with both results?

Yes, if you've run both:

Export captures current model type
Both OLS and Bayesian results included in Excel
Can reimport and switch between them

Which is "better"?

No universal answer. Depends on:

OLS advantages:

Faster
Simpler
More familiar
Standard in industry

Bayesian advantages:

More flexible
Better uncertainty quantification
Handles complexity better
More theoretically principled

Practical answer: Start with OLS, switch to Bayesian for final model if needed

Troubleshooting

"Bayesian feature not available"

Cause: Free tier doesn't include Bayesian

Solution:

Upgrade to Professional or Enterprise
Click upgrade link in dialog
Or continue with OLS

Bayesian results missing after switch

Cause: Haven't run MCMC inference yet

Solution:

Navigate to Bayesian Model Interface
Click "Run Inference"
Wait for completion
Results now available

OLS and Bayesian give very different results

Possible causes:

Strong priors pulling estimates
Convergence issues in Bayesian
Small sample size

Solutions:

Check R-hat (should be < 1.05)
Use weaker priors (larger std dev)
Increase MCMC iterations
Compare with prior predictive checks

Can't switch to Bayesian

Cause: Model has fixed coefficients (OLS feature)

Solution:

Remove fixed coefficients
Set all to "Floating"
Then switch to Bayesian

Key Takeaways

OLS is faster and simpler - great for exploration
Bayesian provides uncertainty quantification and prior incorporation
Switch anytime without losing work
Recommended workflow: OLS exploration → Bayesian refinement
Must run MCMC inference after switching to Bayesian
Both use same model specification (variables)
Export includes results from both methods if run
Choose based on project needs and stakeholder requirements

PreviousAdding/Removing Variables NextAdstock Configuration

Last updated 27 days ago