Priors in Bayesian MMM

Incorporating Business Knowledge into Your Models

Bayesian inference allows you to encode business knowledge, domain expertise, and past learnings directly into your MMM through prior distributions. This page explains what priors are, when to use them, how to set them effectively, and how they improve model quality in MixModeler.


What Are Priors?

The Bayesian Framework

Bayesian inference updates prior beliefs with observed data to produce posterior beliefs:

P(β|data) ∝ P(data|β) × P(β)

Components:

Prior: P(β) What you believe about parameters before seeing the data

  • Example: "I expect TV coefficient to be positive and around 0.5"

Likelihood: P(data|β) How well the model fits the observed data for given parameter values

Posterior: P(β|data) Updated belief about parameters after seeing the data

  • Combines prior knowledge + data evidence


Why Use Priors?

Regularization Priors prevent overfitting by constraining estimates to reasonable ranges

Stability With limited data (< 52 weeks), priors stabilize coefficient estimates

Domain Knowledge Incorporate business expertise: "TV should have positive effect based on past studies"

Constraints Enforce requirements: "Marketing coefficients must be non-negative"

Better Uncertainty Priors lead to more realistic credible intervals (not overconfident)


Types of Priors in MixModeler

Weakly Informative Priors (Default)

Definition: Broad priors that guide estimates without strongly constraining them

Formula:

β ~ Normal(μ = 0, σ = 10)

Use Case:

  • No strong prior knowledge

  • Want data to dominate

  • Exploratory analysis

Effect:

  • Prevents extreme coefficient values

  • Allows data to drive estimates

  • Minimal influence on final results


Informative Priors

Definition: Specific beliefs based on business knowledge or past studies

Formula:

β_TV ~ Normal(μ = 0.5, σ = 0.2)

Use Case:

  • Have domain expertise or historical data

  • Past MMM studies inform expectations

  • Need to constrain estimates for business reasons

Effect:

  • Shrinks estimates toward prior mean

  • Reduces uncertainty (narrower credible intervals)

  • Balances data + expertise


Strongly Informative Priors

Definition: Very specific beliefs with high confidence

Formula:

β_TV ~ Normal(μ = 0.6, σ = 0.05)

Use Case (Rare):

  • Very strong evidence from rigorous past studies

  • Experimental data (RCTs) inform priors

  • Regulatory/business requirements demand specific ranges

Warning:

  • Can overwhelm data if prior is wrong

  • Use cautiously - only when confidence is justified

  • Always run sensitivity analysis


Non-Negative Priors

Definition: Forces coefficients to be positive (marketing should increase KPI)

Formula:

β_TV ~ HalfNormal(σ = 1.0)
β_TV ~ Exponential(λ = 1.0)
β_TV ~ LogNormal(μ = 0, σ = 1)

Use Case:

  • Marketing channels (positive effect expected)

  • Business logic demands non-negative

  • Prevent nonsensical negative coefficients

Note: MixModeler uses Normal priors by default, but you can effectively constrain by setting μ > 0 with moderate σ.


Setting Priors in MixModeler

Prior Parameters

For each variable, you specify:

Prior Mean (μ) Expected coefficient value

Prior Standard Deviation (σ) Uncertainty around expectation

Example:

μ = 0.5, σ = 0.3

Means: "I expect coefficient around 0.5, but fairly uncertain (±0.3)"


Practical Prior Setting Strategies

Strategy 1: Start from OLS Estimates

Process:

  1. Build OLS model first (quick baseline)

  2. Note OLS coefficient: β̂ = 0.52, SE = 0.08

  3. Set Bayesian prior:

    • μ = 0.52 (use OLS estimate)

    • σ = 0.16 (2× OLS standard error)

Rationale: OLS provides data-driven starting point Wider σ than OLS SE allows Bayesian updating


Strategy 2: Business Judgment

Process:

  1. Ask: "What ROI do I expect from this channel?"

  2. Translate to coefficient:

    • Expected ROI: $2 return per $1 spent

    • Prior mean: μ = 2.0

  3. Set uncertainty:

    • Fairly confident: σ = 0.5

    • Very uncertain: σ = 1.5


Strategy 3: Industry Benchmarks

Process:

  1. Research typical MMM coefficients for your industry

  2. Use published studies or consulting benchmarks

  3. Example: "Retail MMM studies show TV ROI of 1.5-2.5"

  4. Set prior: μ = 2.0, σ = 0.5


Strategy 4: Past MMM Results

Process:

  1. Review previous MMM models (last year's analysis)

  2. Use historical coefficients as priors for new model

  3. Example: Last year β_TV = 0.58

  4. Set prior: μ = 0.58, σ = 0.15

Benefit: Continuity and comparability across time periods


How Strong Should Your Priors Be?

Weak Priors (Large σ)

When to Use:

  • Little prior knowledge

  • Exploratory analysis

  • First-time modeling

  • Want data to dominate

Example:

μ = 0, σ = 10 (very wide)

Effect: Posterior ≈ Likelihood (data drives results)


Moderate Priors (Medium σ)

When to Use:

  • Some domain knowledge

  • Past studies available

  • Balancing expertise + data

Example:

μ = 0.5, σ = 0.3

Effect: Posterior balances prior + data


Strong Priors (Small σ)

When to Use (Rare):

  • Very high confidence in prior

  • Experimental evidence

  • Regulatory constraints

Example:

μ = 0.6, σ = 0.05

Effect: Posterior strongly influenced by prior

Warning: Only use when justified - can distort results if prior is wrong


Prior Strength and Data Size

Limited Data (< 52 weeks)

Use stronger priors (smaller σ) to stabilize estimates

Rationale: Limited data means high uncertainty Priors supplement limited information

Example:

30 weeks of data → μ = 0.5, σ = 0.2

Adequate Data (52-104 weeks)

Use moderate priors (medium σ)

Rationale: Data provides good information Priors guide but don't dominate

Example:

80 weeks of data → μ = 0.5, σ = 0.3

Abundant Data (104+ weeks)

Use weak priors (large σ) or let data dominate

Rationale: Abundant data speaks for itself Priors provide minimal regularization

Example:

150 weeks of data → μ = 0, σ = 5

Common Prior Scenarios

Scenario 1: New Channel, No History

Situation: Testing a new marketing channel (e.g., TikTok) with no past data

Prior Strategy: Weakly Informative

μ = 0 (neutral)
σ = 2 (wide range)

Rationale: No strong beliefs Let data reveal effectiveness


Scenario 2: Established Channel, Known Positive Effect

Situation: TV advertising, known to work from past studies

Prior Strategy: Informative Positive

μ = 0.5 (expect positive ROI)
σ = 0.3 (moderate uncertainty)

Rationale: Guide model toward positive coefficient Prevent spurious negative results


Scenario 3: Past MMM Available

Situation: Last year's model estimated β_Radio = 0.42

Prior Strategy: Based on Historical

μ = 0.42 (last year's estimate)
σ = 0.20 (allow for changes)

Rationale: Leverage past learnings Maintain consistency year-over-year


Scenario 4: Business Requirement

Situation: CFO requires non-negative marketing ROI for budget approval

Prior Strategy: Constrained Positive

μ = 0.3 (modest positive effect)
σ = 0.25 (but allow data to inform magnitude)

Rationale: Satisfy business constraint Still data-driven within positive range


Scenario 5: Conflicting Stakeholder Beliefs

Situation: Marketing team says TV ROI is 3.0; Finance says 1.5

Prior Strategy: Middle Ground

μ = 2.0 (average of beliefs)
σ = 0.75 (wide to accommodate both views)

Rationale: Encode uncertainty Let data adjudicate


Priors by Variable Type

Marketing Channels

Default Approach:

μ = 0.5 (expect positive ROI)
σ = 0.5 (moderate uncertainty)

Reasoning:

  • Marketing should drive incremental KPI

  • Magnitude varies by channel efficiency


Seasonality Variables

Default Approach:

μ = 0 (no directional expectation)
σ = 1 (moderate range)

Reasoning:

  • Some months above average, others below

  • Effects can be positive or negative


External Factors (Price, Weather, etc.)

Default Approach:

μ = 0 (neutral)
σ = 1 (moderate)

Alternative if Direction Known:

Price: μ = -0.5, σ = 0.3 (higher price → lower sales)

Baseline/Intercept

Default Approach:

μ = mean(KPI) (around average KPI)
σ = std(KPI) (wide range)

Reasoning: Baseline represents organic performance Should be close to average KPI level


Sensitivity Analysis: Testing Prior Impact

Always test how sensitive your results are to prior choices:

Process

Step 1: Base Model Build model with your chosen priors

Step 2: Weak Priors Rebuild with very wide priors (σ = 10)

Step 3: Strong Priors Rebuild with tighter priors (σ = 0.1)

Step 4: Different Means Test μ ± 50%

Step 5: Compare Results

Interpretation:

  • If results vary minimally → priors have little influence (data dominates)

  • If results change substantially → priors strongly influence (weak data)


Prior-Posterior Plots

Visualize how priors are updated by data:

Good Outcome: Data Overwhelms Prior

Prior:     ----wide bell curve----
Posterior:      --narrow peak--

Data was informative, posterior differs from prior

Warning: Prior Dominates Data

Prior:        --narrow peak--
Posterior:     --same peak--

Data didn't change belief (either data weak or prior too strong)

Ideal Balance:

Prior:      ----moderate bell----
Posterior:     ---shifted peak---

Prior + data combine for optimal estimate


Common Prior Mistakes

❌ Mistake 1: Too Strong Priors with Limited Evidence

Problem: σ = 0.05 when you don't really know Result: Model ignores data, just returns your prior Fix: Use wider priors (σ = 0.3+) unless strongly justified


❌ Mistake 2: Ignoring Priors Entirely

Problem: Using default flat priors without thought Result: Missed opportunity to stabilize estimates, especially with limited data Fix: Always specify informative priors for key marketing variables


❌ Mistake 3: Conflicting Priors

Problem:

  • μ_TV = 2.0 (strong positive)

  • μ_Radio = 2.0 (strong positive)

  • But historically TV >> Radio in effectiveness

Result: Model struggles to reconcile priors with reality Fix: Use relative magnitudes that match business understanding


❌ Mistake 4: No Sensitivity Testing

Problem: Build one model with one prior choice, assume it's correct Result: Unknown sensitivity to prior assumptions Fix: Always test 2-3 prior configurations


❌ Mistake 5: Using Priors to Force Desired Results

Problem: "CFO wants TV ROI = 3.0, so I'll set μ = 3.0, σ = 0.01" Result: Model is no longer data-driven, just confirms predetermined beliefs Fix: Priors should guide, not dictate. Use data to inform decisions, not justify them.


Hierarchical Priors (Advanced)

For related variables, use hierarchical structure:

Concept: All TV variables share a common prior distribution:

β_TV_National ~ Normal(μ_TV, σ_TV)
β_TV_Local ~ Normal(μ_TV, σ_TV)

μ_TV ~ Normal(0.5, 0.3)  # Hyperprior
σ_TV ~ HalfNormal(0.2)   # Hyperprior

Benefit:

  • Partial pooling of information

  • TV variables inform each other

  • More stable estimates with limited data per variable

Note: MixModeler uses independent priors by default (simpler). Hierarchical priors require custom configuration.


Communicating Priors to Stakeholders

Why Transparency Matters

Non-statisticians may be skeptical: "You're just putting your assumptions in!"

Response:

  1. Acknowledge: "Yes, priors encode assumptions - that's a feature, not a bug"

  2. Justify: "We're using industry benchmarks / past studies / business knowledge"

  3. Validate: "We tested sensitivity - results hold across reasonable prior choices"

  4. Benefit: "Priors prevent nonsensical results and stabilize estimates with limited data"


Presenting Priors in Reports

Include:

  • Prior specifications (μ, σ) for each variable

  • Rationale for prior choices (business knowledge, past studies, etc.)

  • Sensitivity analysis showing robustness

  • Prior-posterior comparison (show data updated beliefs)

Example Narrative: "We set a prior for TV coefficient at μ=0.5 (σ=0.3) based on last year's MMM which estimated 0.52. The posterior estimate is 0.54 [0.42-0.66], indicating consistency with historical performance and validating our prior choice."


Best Practices for Priors

✅ Do's

Start with OLS Use OLS estimates to inform Bayesian priors (data-driven starting point)

Document Rationale Record why each prior was chosen (reproducibility + transparency)

Test Sensitivity Always check if results hold with different prior choices

Use Domain Knowledge Leverage business expertise, past studies, industry benchmarks

Match Data Strength Weak data → stronger priors; Strong data → weaker priors

Communicate Clearly Explain priors to stakeholders in non-technical terms


❌ Don'ts

Don't Use Priors to Force Results Priors guide, they don't dictate. Data should still inform.

Don't Ignore Sensitivity One prior choice ≠ robust analysis

Don't Over-Constrain Extremely narrow priors (σ < 0.05) rarely justified

Don't Forget Constraints If marketing must be positive, encode that (non-negative priors)

Don't Blindly Accept Defaults Default priors may not suit your business context


Prior Selection Workflow

Step 1: Understand Your Data How many observations? How much variance?

Step 2: Review Past Studies What do historical MMMs or industry benchmarks suggest?

Step 3: Consult Stakeholders What do marketing/finance teams believe?

Step 4: Build OLS Baseline Get data-driven point estimates

Step 5: Set Informative Priors Combine OLS + business knowledge

Step 6: Run Bayesian Model Estimate posterior distributions

Step 7: Check Convergence R-hat < 1.1, adequate ESS

Step 8: Sensitivity Analysis Test alternative prior specifications

Step 9: Validate & Iterate Results make business sense? Adjust if needed

Step 10: Document & Present Transparent reporting of prior choices and sensitivity


Summary

Key Takeaways:

🎯 Priors encode business knowledge - leverage expertise and past learnings

📊 Bayesian = Prior + Data - posterior balances both sources of information

🛡️ Regularization benefit - priors prevent overfitting and nonsensical estimates

💪 Stronger priors for limited data - compensate for weak likelihood

🔬 Always test sensitivity - ensure results are robust to prior choices

Use OLS to inform priors - data-driven starting point for Bayesian refinement

📈 Non-negative constraints - encode business logic (marketing should help, not hurt)

🎓 Transparency is key - document and communicate prior rationale to stakeholders

Properly specified priors transform Bayesian MMM from a black box into a transparent, principled framework that combines the best of data and domain expertise.

Last updated