Priors in Bayesian MMM
Incorporating Business Knowledge into Your Models
Bayesian inference allows you to encode business knowledge, domain expertise, and past learnings directly into your MMM through prior distributions. This page explains what priors are, when to use them, how to set them effectively, and how they improve model quality in MixModeler.
What Are Priors?
The Bayesian Framework
Bayesian inference updates prior beliefs with observed data to produce posterior beliefs:
P(β|data) ∝ P(data|β) × P(β)Components:
Prior: P(β) What you believe about parameters before seeing the data
- Example: "I expect TV coefficient to be positive and around 0.5" 
Likelihood: P(data|β) How well the model fits the observed data for given parameter values
Posterior: P(β|data) Updated belief about parameters after seeing the data
- Combines prior knowledge + data evidence 
Why Use Priors?
Regularization Priors prevent overfitting by constraining estimates to reasonable ranges
Stability With limited data (< 52 weeks), priors stabilize coefficient estimates
Domain Knowledge Incorporate business expertise: "TV should have positive effect based on past studies"
Constraints Enforce requirements: "Marketing coefficients must be non-negative"
Better Uncertainty Priors lead to more realistic credible intervals (not overconfident)
Types of Priors in MixModeler
Weakly Informative Priors (Default)
Definition: Broad priors that guide estimates without strongly constraining them
Formula:
β ~ Normal(μ = 0, σ = 10)Use Case:
- No strong prior knowledge 
- Want data to dominate 
- Exploratory analysis 
Effect:
- Prevents extreme coefficient values 
- Allows data to drive estimates 
- Minimal influence on final results 
Informative Priors
Definition: Specific beliefs based on business knowledge or past studies
Formula:
β_TV ~ Normal(μ = 0.5, σ = 0.2)Use Case:
- Have domain expertise or historical data 
- Past MMM studies inform expectations 
- Need to constrain estimates for business reasons 
Effect:
- Shrinks estimates toward prior mean 
- Reduces uncertainty (narrower credible intervals) 
- Balances data + expertise 
Strongly Informative Priors
Definition: Very specific beliefs with high confidence
Formula:
β_TV ~ Normal(μ = 0.6, σ = 0.05)Use Case (Rare):
- Very strong evidence from rigorous past studies 
- Experimental data (RCTs) inform priors 
- Regulatory/business requirements demand specific ranges 
Warning:
- Can overwhelm data if prior is wrong 
- Use cautiously - only when confidence is justified 
- Always run sensitivity analysis 
Non-Negative Priors
Definition: Forces coefficients to be positive (marketing should increase KPI)
Formula:
β_TV ~ HalfNormal(σ = 1.0)
β_TV ~ Exponential(λ = 1.0)
β_TV ~ LogNormal(μ = 0, σ = 1)Use Case:
- Marketing channels (positive effect expected) 
- Business logic demands non-negative 
- Prevent nonsensical negative coefficients 
Note: MixModeler uses Normal priors by default, but you can effectively constrain by setting μ > 0 with moderate σ.
Setting Priors in MixModeler
Prior Parameters
For each variable, you specify:
Prior Mean (μ) Expected coefficient value
Prior Standard Deviation (σ) Uncertainty around expectation
Example:
μ = 0.5, σ = 0.3Means: "I expect coefficient around 0.5, but fairly uncertain (±0.3)"
Practical Prior Setting Strategies
Strategy 1: Start from OLS Estimates
Process:
- Build OLS model first (quick baseline) 
- Note OLS coefficient: β̂ = 0.52, SE = 0.08 
- Set Bayesian prior: - μ = 0.52 (use OLS estimate) 
- σ = 0.16 (2× OLS standard error) 
 
Rationale: OLS provides data-driven starting point Wider σ than OLS SE allows Bayesian updating
Strategy 2: Business Judgment
Process:
- Ask: "What ROI do I expect from this channel?" 
- Translate to coefficient: - Expected ROI: $2 return per $1 spent 
- Prior mean: μ = 2.0 
 
- Set uncertainty: - Fairly confident: σ = 0.5 
- Very uncertain: σ = 1.5 
 
Strategy 3: Industry Benchmarks
Process:
- Research typical MMM coefficients for your industry 
- Use published studies or consulting benchmarks 
- Example: "Retail MMM studies show TV ROI of 1.5-2.5" 
- Set prior: μ = 2.0, σ = 0.5 
Strategy 4: Past MMM Results
Process:
- Review previous MMM models (last year's analysis) 
- Use historical coefficients as priors for new model 
- Example: Last year β_TV = 0.58 
- Set prior: μ = 0.58, σ = 0.15 
Benefit: Continuity and comparability across time periods
How Strong Should Your Priors Be?
Weak Priors (Large σ)
When to Use:
- Little prior knowledge 
- Exploratory analysis 
- First-time modeling 
- Want data to dominate 
Example:
μ = 0, σ = 10 (very wide)Effect: Posterior ≈ Likelihood (data drives results)
Moderate Priors (Medium σ)
When to Use:
- Some domain knowledge 
- Past studies available 
- Balancing expertise + data 
Example:
μ = 0.5, σ = 0.3Effect: Posterior balances prior + data
Strong Priors (Small σ)
When to Use (Rare):
- Very high confidence in prior 
- Experimental evidence 
- Regulatory constraints 
Example:
μ = 0.6, σ = 0.05Effect: Posterior strongly influenced by prior
Warning: Only use when justified - can distort results if prior is wrong
Prior Strength and Data Size
Limited Data (< 52 weeks)
Use stronger priors (smaller σ) to stabilize estimates
Rationale: Limited data means high uncertainty Priors supplement limited information
Example:
30 weeks of data → μ = 0.5, σ = 0.2Adequate Data (52-104 weeks)
Use moderate priors (medium σ)
Rationale: Data provides good information Priors guide but don't dominate
Example:
80 weeks of data → μ = 0.5, σ = 0.3Abundant Data (104+ weeks)
Use weak priors (large σ) or let data dominate
Rationale: Abundant data speaks for itself Priors provide minimal regularization
Example:
150 weeks of data → μ = 0, σ = 5Common Prior Scenarios
Scenario 1: New Channel, No History
Situation: Testing a new marketing channel (e.g., TikTok) with no past data
Prior Strategy: Weakly Informative
μ = 0 (neutral)
σ = 2 (wide range)Rationale: No strong beliefs Let data reveal effectiveness
Scenario 2: Established Channel, Known Positive Effect
Situation: TV advertising, known to work from past studies
Prior Strategy: Informative Positive
μ = 0.5 (expect positive ROI)
σ = 0.3 (moderate uncertainty)Rationale: Guide model toward positive coefficient Prevent spurious negative results
Scenario 3: Past MMM Available
Situation: Last year's model estimated β_Radio = 0.42
Prior Strategy: Based on Historical
μ = 0.42 (last year's estimate)
σ = 0.20 (allow for changes)Rationale: Leverage past learnings Maintain consistency year-over-year
Scenario 4: Business Requirement
Situation: CFO requires non-negative marketing ROI for budget approval
Prior Strategy: Constrained Positive
μ = 0.3 (modest positive effect)
σ = 0.25 (but allow data to inform magnitude)Rationale: Satisfy business constraint Still data-driven within positive range
Scenario 5: Conflicting Stakeholder Beliefs
Situation: Marketing team says TV ROI is 3.0; Finance says 1.5
Prior Strategy: Middle Ground
μ = 2.0 (average of beliefs)
σ = 0.75 (wide to accommodate both views)Rationale: Encode uncertainty Let data adjudicate
Priors by Variable Type
Marketing Channels
Default Approach:
μ = 0.5 (expect positive ROI)
σ = 0.5 (moderate uncertainty)Reasoning:
- Marketing should drive incremental KPI 
- Magnitude varies by channel efficiency 
Seasonality Variables
Default Approach:
μ = 0 (no directional expectation)
σ = 1 (moderate range)Reasoning:
- Some months above average, others below 
- Effects can be positive or negative 
External Factors (Price, Weather, etc.)
Default Approach:
μ = 0 (neutral)
σ = 1 (moderate)Alternative if Direction Known:
Price: μ = -0.5, σ = 0.3 (higher price → lower sales)Baseline/Intercept
Default Approach:
μ = mean(KPI) (around average KPI)
σ = std(KPI) (wide range)Reasoning: Baseline represents organic performance Should be close to average KPI level
Sensitivity Analysis: Testing Prior Impact
Always test how sensitive your results are to prior choices:
Process
Step 1: Base Model Build model with your chosen priors
Step 2: Weak Priors Rebuild with very wide priors (σ = 10)
Step 3: Strong Priors Rebuild with tighter priors (σ = 0.1)
Step 4: Different Means Test μ ± 50%
Step 5: Compare Results
Interpretation:
- If results vary minimally → priors have little influence (data dominates) 
- If results change substantially → priors strongly influence (weak data) 
Prior-Posterior Plots
Visualize how priors are updated by data:
Good Outcome: Data Overwhelms Prior
Prior:     ----wide bell curve----
Posterior:      --narrow peak--Data was informative, posterior differs from prior
Warning: Prior Dominates Data
Prior:        --narrow peak--
Posterior:     --same peak--Data didn't change belief (either data weak or prior too strong)
Ideal Balance:
Prior:      ----moderate bell----
Posterior:     ---shifted peak---Prior + data combine for optimal estimate
Common Prior Mistakes
❌ Mistake 1: Too Strong Priors with Limited Evidence
Problem: σ = 0.05 when you don't really know Result: Model ignores data, just returns your prior Fix: Use wider priors (σ = 0.3+) unless strongly justified
❌ Mistake 2: Ignoring Priors Entirely
Problem: Using default flat priors without thought Result: Missed opportunity to stabilize estimates, especially with limited data Fix: Always specify informative priors for key marketing variables
❌ Mistake 3: Conflicting Priors
Problem:
- μ_TV = 2.0 (strong positive) 
- μ_Radio = 2.0 (strong positive) 
- But historically TV >> Radio in effectiveness 
Result: Model struggles to reconcile priors with reality Fix: Use relative magnitudes that match business understanding
❌ Mistake 4: No Sensitivity Testing
Problem: Build one model with one prior choice, assume it's correct Result: Unknown sensitivity to prior assumptions Fix: Always test 2-3 prior configurations
❌ Mistake 5: Using Priors to Force Desired Results
Problem: "CFO wants TV ROI = 3.0, so I'll set μ = 3.0, σ = 0.01" Result: Model is no longer data-driven, just confirms predetermined beliefs Fix: Priors should guide, not dictate. Use data to inform decisions, not justify them.
Hierarchical Priors (Advanced)
For related variables, use hierarchical structure:
Concept: All TV variables share a common prior distribution:
β_TV_National ~ Normal(μ_TV, σ_TV)
β_TV_Local ~ Normal(μ_TV, σ_TV)
μ_TV ~ Normal(0.5, 0.3)  # Hyperprior
σ_TV ~ HalfNormal(0.2)   # HyperpriorBenefit:
- Partial pooling of information 
- TV variables inform each other 
- More stable estimates with limited data per variable 
Note: MixModeler uses independent priors by default (simpler). Hierarchical priors require custom configuration.
Communicating Priors to Stakeholders
Why Transparency Matters
Non-statisticians may be skeptical: "You're just putting your assumptions in!"
Response:
- Acknowledge: "Yes, priors encode assumptions - that's a feature, not a bug" 
- Justify: "We're using industry benchmarks / past studies / business knowledge" 
- Validate: "We tested sensitivity - results hold across reasonable prior choices" 
- Benefit: "Priors prevent nonsensical results and stabilize estimates with limited data" 
Presenting Priors in Reports
Include:
- Prior specifications (μ, σ) for each variable 
- Rationale for prior choices (business knowledge, past studies, etc.) 
- Sensitivity analysis showing robustness 
- Prior-posterior comparison (show data updated beliefs) 
Example Narrative: "We set a prior for TV coefficient at μ=0.5 (σ=0.3) based on last year's MMM which estimated 0.52. The posterior estimate is 0.54 [0.42-0.66], indicating consistency with historical performance and validating our prior choice."
Best Practices for Priors
✅ Do's
Start with OLS Use OLS estimates to inform Bayesian priors (data-driven starting point)
Document Rationale Record why each prior was chosen (reproducibility + transparency)
Test Sensitivity Always check if results hold with different prior choices
Use Domain Knowledge Leverage business expertise, past studies, industry benchmarks
Match Data Strength Weak data → stronger priors; Strong data → weaker priors
Communicate Clearly Explain priors to stakeholders in non-technical terms
❌ Don'ts
Don't Use Priors to Force Results Priors guide, they don't dictate. Data should still inform.
Don't Ignore Sensitivity One prior choice ≠ robust analysis
Don't Over-Constrain Extremely narrow priors (σ < 0.05) rarely justified
Don't Forget Constraints If marketing must be positive, encode that (non-negative priors)
Don't Blindly Accept Defaults Default priors may not suit your business context
Prior Selection Workflow
Step 1: Understand Your Data How many observations? How much variance?
Step 2: Review Past Studies What do historical MMMs or industry benchmarks suggest?
Step 3: Consult Stakeholders What do marketing/finance teams believe?
Step 4: Build OLS Baseline Get data-driven point estimates
Step 5: Set Informative Priors Combine OLS + business knowledge
Step 6: Run Bayesian Model Estimate posterior distributions
Step 7: Check Convergence R-hat < 1.1, adequate ESS
Step 8: Sensitivity Analysis Test alternative prior specifications
Step 9: Validate & Iterate Results make business sense? Adjust if needed
Step 10: Document & Present Transparent reporting of prior choices and sensitivity
Summary
Key Takeaways:
🎯 Priors encode business knowledge - leverage expertise and past learnings
📊 Bayesian = Prior + Data - posterior balances both sources of information
🛡️ Regularization benefit - priors prevent overfitting and nonsensical estimates
💪 Stronger priors for limited data - compensate for weak likelihood
🔬 Always test sensitivity - ensure results are robust to prior choices
✅ Use OLS to inform priors - data-driven starting point for Bayesian refinement
📈 Non-negative constraints - encode business logic (marketing should help, not hurt)
🎓 Transparency is key - document and communicate prior rationale to stakeholders
Properly specified priors transform Bayesian MMM from a black box into a transparent, principled framework that combines the best of data and domain expertise.
Last updated