Common Pitfalls to Avoid
Overview
Learning from common mistakes accelerates your MMM success. This guide highlights frequent pitfalls in marketing mix modeling and provides practical solutions to avoid them.
Data Pitfalls
Pitfall 1: Insufficient Historical Data
Mistake: Trying to build MMM with only 3-6 months of data
Why It's a Problem:
Not enough observations for reliable estimates
Can't capture seasonality
High risk of overfitting
Unstable coefficients
Example:
20 weeks of data, 15 variables
→ Ratio: 20/15 = 1.3 (way too low)
→ Result: Unreliable model, random noiseSolution:
Minimum: 26 weeks (6 months)
Recommended: 52+ weeks (1+ years)
Ideal: 104+ weeks (2+ years)
Rule: At least 5 observations per variable
If You Must Use Short Period:
Drastically reduce variables (5-8 max)
Use Bayesian with informative priors
Acknowledge high uncertainty
Validate with business judgment
Pitfall 2: Mixing Data Granularities
Mistake: Combining weekly and monthly data in same model
Example:
Why It's a Problem:
Temporal misalignment
Artificial correlation or lack thereof
Incorrect attribution
Solution:
Standardize all data to same granularity
If monthly data only: aggregate weekly to monthly
If weekly preferred: properly disaggregate monthly (use actual weekly pattern if available)
Document any assumptions made
Pitfall 3: Ignoring Missing Data
Mistake: Uploading data with blanks, assuming MixModeler will handle it
What Happens:
Rows with missing KPI dropped
Missing predictors treated as zero (incorrect)
Biased estimates
Reduced sample size
Example:
Solution:
Fill missing spend with actual zero (if truly no spend)
Interpolate if data error (average of neighbors)
Remove variable if >20% missing
Create "missing" indicator variable if needed
Never leave blanks
Pitfall 4: Using Revenue Instead of Units (or Vice Versa)
Mistake: Not considering whether KPI should be revenue or units
Problem with Revenue KPI:
Confounds price and volume effects
Price increases show as "marketing success"
Can't separate marketing from pricing impact
Problem with Units KPI:
Ignores revenue value
Treats $10 and $100 items equally
Misses pricing strategy effects
Solution:
Use Revenue when: Prices stable, revenue is business goal
Use Units when: Prices vary significantly, want pure volume
Include Price as Control if using Units
Build Both Models if uncertain, compare insights
Pitfall 5: Correlated Variables Without Investigation
Mistake: Including Facebook_Spend and Instagram_Spend (r=0.95) without addressing correlation
Why It's a Problem:
High multicollinearity
Unstable coefficients
Can't isolate individual effects
VIF >10
Example:
Solution:
Check correlation matrix before modeling
Combine highly correlated variables (r >0.8)
Keep only one if theoretically redundant
Document decision rationale
Accept correlation only if both theoretically essential
Model Building Pitfalls
Pitfall 6: P-Value Fishing
Mistake: Removing all variables with p >0.05, regardless of business importance
Example:
Why It's a Problem:
Statistical significance ≠ business importance
P-values affected by sample size
Removes theoretically important variables
Keeps spurious correlations
Solution:
Use p <0.10 as guide, not rule
Keep variables if:
Theoretically important
Practically significant (large business impact)
Large budget or strategic channel
Remove if:
p >0.20 AND no business justification
Coefficient near zero
Doesn't add to model story
Pitfall 7: Over-Transformation
Mistake: Applying log, adstock, saturation, and lag to same variable
Example:
Why It's a Problem:
Impossible to interpret
Lost business meaning
Overfitting
Can't explain to stakeholders
Solution:
Maximum 1-2 transformations per variable
Typical: Adstock OR Saturation (or Adstock + Saturation for advanced)
Keep it interpretable
Document transformation rationale
Pitfall 8: Ignoring Multicollinearity
Mistake: Building model with VIF >20, using coefficients for decisions
Example:
Why It's a Problem:
Coefficients unstable
Signs can reverse
Standard errors inflated
Attribution unreliable
Solution:
Always check VIF (Model Diagnostics)
Target: All VIF <5
Acceptable: VIF <10
If VIF >10: Remove or combine variables
Never interpret coefficients with high VIF
Pitfall 9: Overfitting
Mistake: Including 40 variables with only 52 weeks of data
Example:
Why It's a Problem:
Model fits noise, not signal
Poor prediction on new data
Unstable estimates
False confidence
Solution:
Rule: Keep p <n/5 (52 weeks → max 10 variables)
Better: p <n/10 (52 weeks → max 5 variables)
Focus on key variables
Combine similar variables
Use regularization (Bayesian with priors)
Pitfall 10: Skipping Diagnostics
Mistake: "R² is 0.85, good enough, let's use the model!"
Why It's a Problem:
High R² doesn't mean valid model
Could have severe multicollinearity
Could have autocorrelation
Results may be unreliable
Example:
Solution:
Always run Model Diagnostics
Check ALL tests:
Multicollinearity (VIF)
Autocorrelation (Durbin-Watson)
Heteroscedasticity
Normality
Influential points
Address issues before using model
Lower R² with good diagnostics beats high R² with problems
Interpretation Pitfalls
Pitfall 11: Confusing Correlation with Causation
Mistake: "Digital has highest coefficient, so digital causes sales"
Why It's a Problem:
Correlation ≠ causation
Endogeneity (reverse causation)
Omitted variable bias
Spurious correlations
Example:
Solution:
Say "associated with" not "causes"
Include control variables
Use Granger causality testing
Triangulate with A/B tests
Be humble about causal claims
Consider reverse causation
Pitfall 12: Over-Interpreting Small Coefficients
Mistake: "Email coefficient is 0.001, so email doesn't work"
Context Missing:
Why It's a Problem:
Raw coefficients don't show total impact
Need to multiply by typical spend level
Small coefficient × large spend = big impact
Solution:
Calculate total contribution (coef × average spend)
Use decomposition analysis
Look at percentage contribution
Consider ROI, not just coefficient size
Standardize coefficients for comparison
Pitfall 13: Ignoring Uncertainty
Mistake: Presenting coefficient as exact truth: "TV ROI is 3.25"
Reality:
Why It's a Problem:
False precision
Overconfidence in decisions
Doesn't communicate uncertainty
Stakeholders make binary decisions on uncertain estimates
Solution:
Always report confidence/credible intervals
Use Bayesian for explicit uncertainty
Communicate ranges, not points
Make decisions robust to uncertainty
Acknowledge limitations
Pitfall 14: Forgetting Incrementality
Mistake: "TV contributed $500K revenue, we should spend $500K on TV"
Why It's Wrong:
Contribution ≠ incremental impact
Some sales would occur without TV (base sales)
Coefficient shows incremental effect only
Confusing total and marginal
Correct Interpretation:
Solution:
Use decomposition for contribution analysis
Understand coefficients show incremental lift
Don't confuse "attribution" with "what would happen if we stopped"
Consider base sales separately
Bayesian-Specific Pitfalls
Pitfall 15: Using Default Priors Blindly
Mistake: Not thinking about priors, just using defaults
Problem:
Defaults are weakly informative
May not match your business
Waste of Bayesian framework's power
Missing chance to incorporate knowledge
Example:
Solution:
Think about priors deliberately
Use informative priors when justified
Document prior rationale
Test sensitivity to priors
Start weak, strengthen with justification
Pitfall 16: Ignoring Convergence Diagnostics
Mistake: Using Bayesian results without checking R-hat, ESS, divergences
Example:
Why It's a Problem:
MCMC may not have converged
Posterior estimates incorrect
Credible intervals wrong
Decisions based on noise
Solution:
Always check Bayesian Diagnostics
Require: R-hat <1.01, ESS >400, Divergences ~0
If failed: Increase draws, adjust settings, rerun
Never use non-converged Bayesian model
Pitfall 17: Treating Bayesian Intervals Like Frequentist
Mistake: "95% credible interval, so 95% chance true value is in interval... wait, that's what it means!"
Actually: That IS what it means (unlike confidence intervals), but people sometimes misinterpret the other direction
Common Confusion:
Thinking Bayesian and frequentist intervals are identical
Not leveraging probability statements Bayesian allows
Using Bayesian but interpreting like frequentist
Solution:
Understand credible intervals correctly (direct probability)
Use Bayesian probability statements ("95% probability coefficient >2")
Communicate advantage to stakeholders
Don't use Bayesian if you won't use its benefits
Workflow Pitfalls
Pitfall 18: Not Documenting Decisions
Mistake: Making model changes without recording why
Result:
3 months later: "Why did we include that variable?"
Can't reproduce analysis
Can't defend to stakeholders
Lost institutional knowledge
Solution:
Keep modeling log (Excel, Word, notebook)
Record:
Each model specification
Why variables added/removed
Transformation rationale
Diagnostic results
Model selection reasoning
Version control model exports
Include notes in Excel exports
Pitfall 19: Building in Isolation
Mistake: Analyst builds model alone, presents to skeptical stakeholders
Why It Fails:
Stakeholders don't trust results
"Black box" perception
Pushback on methodology
Recommendations ignored
Solution:
Involve stakeholders early (objective setting)
Share preliminary results for feedback
Explain methodology in advance
Get buy-in before final presentation
Collaborate on interpretation
Make it "our model" not "analyst's model"
Pitfall 20: One-and-Done
Mistake: Building model once, using for years without updates
Why It's a Problem:
Markets change
New channels emerge
Strategies evolve
Old model becomes obsolete
Example:
Solution:
Update models quarterly or semi-annually
Rebuild annually with fresh data
Add new channels as they scale
Track model performance over time
Plan for continuous improvement
Quick Reference: Top 10 Pitfalls
Insufficient data (<52 weeks)
Too many variables (overfitting)
Ignoring multicollinearity (VIF >10)
Skipping diagnostics
P-value fishing (removing based solely on p >0.05)
Over-transformation (uninterpretable)
Confusing correlation and causation
Ignoring uncertainty (presenting point estimates as truth)
Not checking Bayesian convergence
No documentation (can't reproduce)
Pitfall Avoidance Checklist
Before finalizing model:
Data Quality:
[ ] 52+ weeks of data
[ ] All variables same granularity
[ ] No missing values or properly addressed
[ ] Correlation matrix reviewed
Model Specification:
[ ] Variables < n/5
[ ] All VIF <10
[ ] Transformations justified and interpretable
[ ] Business logic sound
Validation:
[ ] All diagnostics run and passing
[ ] Bayesian convergence checked (if applicable)
[ ] Coefficients have expected signs
[ ] Results sanity-checked with stakeholders
Interpretation:
[ ] Uncertainty communicated
[ ] Causal language avoided
[ ] Incremental vs total understood
[ ] Recommendations actionable
Process:
[ ] Decisions documented
[ ] Stakeholders involved
[ ] Update plan established
[ ] Model exported and archived
Next Steps: Review Performance Optimization to speed up your workflow, or return to MMM Workflow Guide for the complete process.
Last updated