Prior Configuration
Overview
Priors represent your beliefs about model parameters before observing the data. In Bayesian MMM, you can set priors for three types of parameters: the intercept, coefficients for marketing variables, and the model error term. Proper prior configuration allows you to encode business knowledge and improve model stability.
Prior Types
1. Intercept Prior
The intercept represents the baseline level of your KPI when all marketing variables are zero.
Default Setting:
Distribution: Normal
Mean: 0
Standard Deviation: 100 (weakly informative)
When to Customize:
You know the baseline KPI level from historical periods with no marketing
You want to enforce a positive or negative baseline
Your KPI is on a different scale (e.g., revenue in millions vs thousands)
2. Coefficient Priors
Coefficient priors can be set globally (for all variables) or individually (for specific variables).
Default Settings:
Distribution: Normal
Mean: 0
Standard Deviation: 10 (weakly informative)
Global Priors: Apply the same prior to all marketing variables. Useful when variables are on similar scales.
Variable-Specific Priors: Set custom priors for individual channels. Essential when you have strong beliefs about specific variables.
3. Error Prior
The error (sigma) term represents unexplained variance in your model.
Default Setting:
Distribution: Half-Normal
Scale: 5
When to Customize:
You have prior knowledge about typical model error from similar analyses
Your KPI has known measurement error
You want to enforce different levels of model flexibility
Prior Distributions
Normal Distribution
The most common prior distribution for coefficients.
Parameters:
Mean (μ): Center of the distribution
Standard Deviation (σ): Spread of the distribution
Use Cases:
General-purpose prior for most marketing variables
Allows both positive and negative effects
Symmetric around the mean
Example: For a TV advertising variable where you expect a moderate positive effect around 3 with uncertainty:
Distribution: Normal
Mean: 3
Std: 2
This says "we believe the coefficient is around 3, but values between 1 and 5 are quite plausible."
Half-Normal Distribution
A normal distribution restricted to positive values only.
Parameters:
Scale (σ): Spread of the distribution (mean is implicitly 0)
Use Cases:
Error term (always positive)
Coefficients that must be non-negative
Variables where negative effects don't make business sense
Example: For a brand awareness campaign that can only have positive or zero effect:
Distribution: Half-Normal
Scale: 5
Student-t Distribution
A distribution with heavier tails than normal, allowing for more extreme values.
Parameters:
Degrees of Freedom (ν): Controls tail heaviness (lower = heavier tails)
Mean (μ): Center of the distribution
Scale (σ): Spread of the distribution
Use Cases:
When you expect potential outliers
Robust estimation in the presence of unusual observations
When unsure about the exact scale of effects
Example: For a variable with uncertain behavior:
Distribution: Student-t
Degrees of Freedom: 3
Mean: 0
Scale: 5
Laplace Distribution
A distribution with peaked center and exponential tails, providing implicit regularization.
Parameters:
Location (μ): Center of the distribution
Scale (b): Spread of the distribution
Use Cases:
Automatic variable selection (pushes small effects toward zero)
When many variables may have minimal impact
Ridge-regression-like regularization
Prior Strength
Weakly Informative Priors (Default)
Characteristics:
Large standard deviations (σ = 10 for coefficients, σ = 100 for intercept)
Center on reasonable values (typically 0 for coefficients)
Let data primarily drive estimates
Provide mild regularization
When to Use: Default choice for most analyses, especially when you don't have strong prior knowledge.
Informative Priors
Characteristics:
Smaller standard deviations (σ = 1-3)
Centered on expected values based on business knowledge
Strongly influence posterior estimates
Require justification and documentation
When to Use:
You have reliable historical data or benchmarks
Testing new channels similar to existing ones
Limited data requires incorporating external knowledge
Regulatory or business constraints on parameter values
Non-Informative Priors
Characteristics:
Very large standard deviations (σ = 100+)
Essentially uniform over reasonable parameter ranges
Minimal influence on estimates
Let data completely drive results
When to Use:
Exploratory analysis with no prior knowledge
Abundant data (100+ observations)
Objective, data-driven analysis requirement
Configuring Priors in MixModeler
Access Prior Settings
In Model Builder, select Bayesian as the modeling method
Click Advanced Settings or Configure Priors
The Prior Configuration panel opens
Set Global Coefficient Priors
For all marketing variables simultaneously:
Under "Global Coefficient Prior"
Select distribution type (Normal, Student-t, Laplace)
Set distribution parameters:
Normal: Mean, Standard Deviation
Student-t: Degrees of Freedom, Mean, Scale
Laplace: Location, Scale
Set Variable-Specific Priors
For individual marketing channels:
Click Add Variable Prior
Select the variable from the dropdown
Choose distribution type
Set parameters specific to this variable
Repeat for other variables requiring custom priors
Variable-specific priors override global priors for those variables.
Set Intercept Prior
Under "Intercept Prior"
Select distribution (usually Normal)
Set Mean and Standard Deviation
Set Error Prior
Under "Error Prior"
Select distribution (usually Half-Normal)
Set Scale parameter
Save Configuration
Click Save Prior Settings to store this configuration with your model. You can reuse these settings for similar models.
Practical Examples
Example 1: Default Configuration (Data-Driven)
Scenario: First Bayesian model, no prior knowledge, 18 months of data
Configuration:
Intercept: Normal(0, 100)
Coefficients: Normal(0, 10) for all variables
Error: Half-Normal(5)
Rationale: Weakly informative priors that let data drive estimates while providing minimal regularization.
Example 2: Informative Priors for TV Advertising
Scenario: Historical TV campaigns show effects between 2-6, average around 4
Configuration:
TV Variable: Normal(4, 1.5)
Other Variables: Normal(0, 10)
Rationale: Strong prior based on historical data, while keeping other variables weakly informative.
Example 3: New Digital Channel Similar to Existing
Scenario: Launching TikTok ads, have data from Instagram with coefficient around 2.5
Configuration:
TikTok Variable: Normal(2.5, 2)
Instagram Variable: Normal(0, 10)
Rationale: Use Instagram as reference point for TikTok prior, with wider uncertainty to account for platform differences.
Example 4: Non-Negative Constraints
Scenario: Brand awareness can only increase sales, never decrease
Configuration:
Brand Awareness: Half-Normal(5)
Other Variables: Normal(0, 10)
Rationale: Constrain brand awareness coefficient to be positive using Half-Normal prior.
Example 5: Regularization for Many Variables
Scenario: 50 marketing variables, expect many to have minimal impact
Configuration:
All Coefficients: Laplace(0, 3)
Error: Half-Normal(5)
Rationale: Laplace prior automatically shrinks small coefficients toward zero, effectively performing variable selection.
Common Prior Configurations
Conservative Business-Driven
Intercept
Normal
μ=baseline_sales, σ=0.2×baseline
Known baseline sales
TV Advertising
Normal
μ=3, σ=1.5
Historical effect size
Digital Channels
Normal
μ=1.5, σ=1
Typical digital performance
Normal
μ=0.5, σ=1
Lower expected impact
Error
Half-Normal
σ=3
Low model uncertainty
Exploratory Data-Driven
Intercept
Normal
μ=0, σ=100
No prior knowledge
All Coefficients
Normal
μ=0, σ=10
Let data decide
Error
Half-Normal
σ=5
Standard uncertainty
Regularized Selection
Intercept
Normal
μ=0, σ=50
Weakly informative
All Coefficients
Laplace
μ=0, b=2
Automatic selection
Error
Half-Normal
σ=5
Standard uncertainty
Evaluating Prior Choice
Prior Predictive Check
Before running the full model, examine what data your priors expect to generate:
In Prior Configuration, click Preview Prior Predictions
System generates synthetic data from priors alone
Compare to your actual data
If wildly different, priors may be too strong or misspecified
Posterior Sensitivity
After model runs, assess prior influence:
Run model with your chosen priors
Run again with non-informative priors
Compare posterior distributions
Large differences indicate strong prior influence
If posteriors are vastly different, either:
Your priors were too strong (reduce by increasing σ)
Your data is limited (priors appropriately influential)
Prior-data conflict exists (investigate discrepancy)
Prior-Posterior Overlap
Check how much the posterior differs from the prior:
High Overlap: Data didn't provide much information, prior dominated
Moderate Overlap: Healthy combination of prior knowledge and data
No Overlap: Data completely overrode prior, prior may have been misspecified
Best Practices
Document Your Priors: Record the business rationale for any informative priors. This is essential for reproducibility and stakeholder communication.
Start Weak: Begin with weakly informative priors. Only strengthen them if you have strong justification and poor convergence with weak priors.
Check Scale: Ensure prior standard deviations are appropriate for your data scale. If KPI is in millions, priors should reflect that scale.
Test Sensitivity: Always run at least one sensitivity analysis with different priors to ensure conclusions are robust.
Communicate Clearly: When presenting results, clearly state which priors were used and why. This builds trust and transparency.
Avoid Conflicts: If your prior and data strongly disagree, investigate. Don't just increase prior strength to force an outcome.
Use Domain Expertise: Involve business stakeholders when setting informative priors. Their knowledge is valuable input.
Version Control: Save different prior configurations and label them clearly (e.g., "Conservative_2024Q1", "Data_Driven_Baseline").
Common Mistakes
Over-Confident Priors: Setting σ too small makes estimates overly dependent on potentially incorrect prior beliefs.
Wrong Scale: Using σ=10 when KPI is in millions can create virtually non-informative priors that don't regularize.
Ignoring Business Logic: Allowing negative coefficients for variables that can only have positive effects.
Not Checking: Failing to examine prior-posterior relationships and sensitivity.
Inconsistency: Using strong priors for some variables but not others without clear rationale.
Next Steps: After configuring priors, learn about MCMC Settings to optimize the sampling process, or explore Credible Intervals to interpret your Bayesian results.
Last updated