Prior Configuration

Overview

Priors represent your beliefs about model parameters before observing the data. In Bayesian MMM, you can set priors for three types of parameters: the intercept, coefficients for marketing variables, and the model error term. Proper prior configuration allows you to encode business knowledge and improve model stability.

Prior Types

1. Intercept Prior

The intercept represents the baseline level of your KPI when all marketing variables are zero.

Default Setting:

Distribution: Normal
Mean: 0
Standard Deviation: 100 (weakly informative)

When to Customize:

You know the baseline KPI level from historical periods with no marketing
You want to enforce a positive or negative baseline
Your KPI is on a different scale (e.g., revenue in millions vs thousands)

2. Coefficient Priors

Coefficient priors can be set globally (for all variables) or individually (for specific variables).

Default Settings:

Distribution: Normal
Mean: 0
Standard Deviation: 10 (weakly informative)

Global Priors: Apply the same prior to all marketing variables. Useful when variables are on similar scales.

Variable-Specific Priors: Set custom priors for individual channels. Essential when you have strong beliefs about specific variables.

3. Error Prior

The error (sigma) term represents unexplained variance in your model.

Default Setting:

Distribution: Half-Normal
Scale: 5

When to Customize:

You have prior knowledge about typical model error from similar analyses
Your KPI has known measurement error
You want to enforce different levels of model flexibility

Prior Distributions

Normal Distribution

The most common prior distribution for coefficients.

Parameters:

Mean (μ): Center of the distribution
Standard Deviation (σ): Spread of the distribution

Use Cases:

General-purpose prior for most marketing variables
Allows both positive and negative effects
Symmetric around the mean

Example: For a TV advertising variable where you expect a moderate positive effect around 3 with uncertainty:

Distribution: Normal
Mean: 3
Std: 2

This says "we believe the coefficient is around 3, but values between 1 and 5 are quite plausible."

Half-Normal Distribution

A normal distribution restricted to positive values only.

Parameters:

Scale (σ): Spread of the distribution (mean is implicitly 0)

Use Cases:

Error term (always positive)
Coefficients that must be non-negative
Variables where negative effects don't make business sense

Example: For a brand awareness campaign that can only have positive or zero effect:

Distribution: Half-Normal
Scale: 5

Student-t Distribution

A distribution with heavier tails than normal, allowing for more extreme values.

Parameters:

Degrees of Freedom (ν): Controls tail heaviness (lower = heavier tails)
Mean (μ): Center of the distribution
Scale (σ): Spread of the distribution

Use Cases:

When you expect potential outliers
Robust estimation in the presence of unusual observations
When unsure about the exact scale of effects

Example: For a variable with uncertain behavior:

Distribution: Student-t
Degrees of Freedom: 3
Mean: 0
Scale: 5

Laplace Distribution

A distribution with peaked center and exponential tails, providing implicit regularization.

Parameters:

Location (μ): Center of the distribution
Scale (b): Spread of the distribution

Use Cases:

Automatic variable selection (pushes small effects toward zero)
When many variables may have minimal impact
Ridge-regression-like regularization

Prior Strength

Weakly Informative Priors (Default)

Characteristics:

Large standard deviations (σ = 10 for coefficients, σ = 100 for intercept)
Center on reasonable values (typically 0 for coefficients)
Let data primarily drive estimates
Provide mild regularization

When to Use: Default choice for most analyses, especially when you don't have strong prior knowledge.

Informative Priors

Characteristics:

Smaller standard deviations (σ = 1-3)
Centered on expected values based on business knowledge
Strongly influence posterior estimates
Require justification and documentation

When to Use:

You have reliable historical data or benchmarks
Testing new channels similar to existing ones
Limited data requires incorporating external knowledge
Regulatory or business constraints on parameter values

Non-Informative Priors

Characteristics:

Very large standard deviations (σ = 100+)
Essentially uniform over reasonable parameter ranges
Minimal influence on estimates
Let data completely drive results

When to Use:

Exploratory analysis with no prior knowledge
Abundant data (100+ observations)
Objective, data-driven analysis requirement

Configuring Priors in MixModeler

Access Prior Settings

In Model Builder, select Bayesian as the modeling method
Click Advanced Settings or Configure Priors
The Prior Configuration panel opens

Set Global Coefficient Priors

For all marketing variables simultaneously:

Under "Global Coefficient Prior"
Select distribution type (Normal, Student-t, Laplace)
Set distribution parameters:
- Normal: Mean, Standard Deviation
- Student-t: Degrees of Freedom, Mean, Scale
- Laplace: Location, Scale

Set Variable-Specific Priors

For individual marketing channels:

Click Add Variable Prior
Select the variable from the dropdown
Choose distribution type
Set parameters specific to this variable
Repeat for other variables requiring custom priors

Variable-specific priors override global priors for those variables.

Set Intercept Prior

Under "Intercept Prior"
Select distribution (usually Normal)
Set Mean and Standard Deviation

Set Error Prior

Under "Error Prior"
Select distribution (usually Half-Normal)
Set Scale parameter

Save Configuration

Click Save Prior Settings to store this configuration with your model. You can reuse these settings for similar models.

Practical Examples

Example 1: Default Configuration (Data-Driven)

Scenario: First Bayesian model, no prior knowledge, 18 months of data

Configuration:

Intercept: Normal(0, 100)
Coefficients: Normal(0, 10) for all variables
Error: Half-Normal(5)

Rationale: Weakly informative priors that let data drive estimates while providing minimal regularization.

Example 2: Informative Priors for TV Advertising

Scenario: Historical TV campaigns show effects between 2-6, average around 4

Configuration:

TV Variable: Normal(4, 1.5)
Other Variables: Normal(0, 10)

Rationale: Strong prior based on historical data, while keeping other variables weakly informative.

Example 3: New Digital Channel Similar to Existing

Scenario: Launching TikTok ads, have data from Instagram with coefficient around 2.5

Configuration:

TikTok Variable: Normal(2.5, 2)
Instagram Variable: Normal(0, 10)

Rationale: Use Instagram as reference point for TikTok prior, with wider uncertainty to account for platform differences.

Example 4: Non-Negative Constraints

Scenario: Brand awareness can only increase sales, never decrease

Configuration:

Brand Awareness: Half-Normal(5)
Other Variables: Normal(0, 10)

Rationale: Constrain brand awareness coefficient to be positive using Half-Normal prior.

Example 5: Regularization for Many Variables

Scenario: 50 marketing variables, expect many to have minimal impact

Configuration:

All Coefficients: Laplace(0, 3)
Error: Half-Normal(5)

Rationale: Laplace prior automatically shrinks small coefficients toward zero, effectively performing variable selection.

Common Prior Configurations

Conservative Business-Driven

Parameter

Distribution

Settings

Use Case

Intercept

Normal

μ=baseline_sales, σ=0.2×baseline

Known baseline sales

TV Advertising

Normal

μ=3, σ=1.5

Historical effect size

Digital Channels

Normal

μ=1.5, σ=1

Typical digital performance

Normal

μ=0.5, σ=1

Lower expected impact

Error

Half-Normal

σ=3

Low model uncertainty

Exploratory Data-Driven

Parameter

Distribution

Settings

Use Case

Intercept

Normal

μ=0, σ=100

No prior knowledge

All Coefficients

Normal

μ=0, σ=10

Let data decide

Error

Half-Normal

σ=5

Standard uncertainty

Regularized Selection

Parameter

Distribution

Settings

Use Case

Intercept

Normal

μ=0, σ=50

Weakly informative

All Coefficients

Laplace

μ=0, b=2

Automatic selection

Error

Half-Normal

σ=5

Standard uncertainty

Evaluating Prior Choice

Prior Predictive Check

Before running the full model, examine what data your priors expect to generate:

In Prior Configuration, click Preview Prior Predictions
System generates synthetic data from priors alone
Compare to your actual data
If wildly different, priors may be too strong or misspecified

Posterior Sensitivity

After model runs, assess prior influence:

Run model with your chosen priors
Run again with non-informative priors
Compare posterior distributions
Large differences indicate strong prior influence

If posteriors are vastly different, either:

Your priors were too strong (reduce by increasing σ)
Your data is limited (priors appropriately influential)
Prior-data conflict exists (investigate discrepancy)

Prior-Posterior Overlap

Check how much the posterior differs from the prior:

High Overlap: Data didn't provide much information, prior dominated
Moderate Overlap: Healthy combination of prior knowledge and data
No Overlap: Data completely overrode prior, prior may have been misspecified

Best Practices

Document Your Priors: Record the business rationale for any informative priors. This is essential for reproducibility and stakeholder communication.

Start Weak: Begin with weakly informative priors. Only strengthen them if you have strong justification and poor convergence with weak priors.

Check Scale: Ensure prior standard deviations are appropriate for your data scale. If KPI is in millions, priors should reflect that scale.

Test Sensitivity: Always run at least one sensitivity analysis with different priors to ensure conclusions are robust.

Communicate Clearly: When presenting results, clearly state which priors were used and why. This builds trust and transparency.

Avoid Conflicts: If your prior and data strongly disagree, investigate. Don't just increase prior strength to force an outcome.

Use Domain Expertise: Involve business stakeholders when setting informative priors. Their knowledge is valuable input.

Version Control: Save different prior configurations and label them clearly (e.g., "Conservative_2024Q1", "Data_Driven_Baseline").

Common Mistakes

Over-Confident Priors: Setting σ too small makes estimates overly dependent on potentially incorrect prior beliefs.

Wrong Scale: Using σ=10 when KPI is in millions can create virtually non-informative priors that don't regularize.

Ignoring Business Logic: Allowing negative coefficients for variables that can only have positive effects.

Not Checking: Failing to examine prior-posterior relationships and sensitivity.

Inconsistency: Using strong priors for some variables but not others without clear rationale.

Next Steps: After configuring priors, learn about MCMC Settings to optimize the sampling process, or explore Credible Intervals to interpret your Bayesian results.

PreviousBayesian Modeling NextMCMC Settings

Last updated 25 days ago