Best Practices for Variable Creation
Guidelines for Effective Variable Engineering
Creating the right variables is crucial for building accurate and actionable MMM models. This page provides proven strategies and best practices for variable engineering in MixModeler.
Core Principles
1. Start Simple, Add Complexity Gradually
Initial Model:
- Raw marketing variables (no transformations) 
- Basic seasonality (month dummies) 
- KPI as-is 
Build Incrementally:
- Add adstock to media channels 
- Apply saturation curves 
- Test interaction terms 
- Create composite variables 
Why: Easier to understand what each transformation contributes, simpler debugging
2. Every Transformation Should Have a Purpose
Bad Practice: "Let me try every transformation and see what sticks"
Good Practice: "TV ads persist for weeks, so I'll apply adstock with 60% decay rate based on industry benchmarks"
Rule: Each transformation should address a specific business hypothesis or known marketing behavior
3. Test Before Committing
Before creating a variable:
- Preview the transformation 
- Check the distribution (min, max, mean) 
- Visualize the effect (charts) 
- Understand what it represents 
After creating:
- Test in model (add to Model Builder) 
- Check t-statistic (is it significant?) 
- Verify coefficient sign (does it make sense?) 
- Compare R² with/without variable 
If no improvement: Don't use the transformation
Naming Conventions
Use Clear, Descriptive Names
Good Names:
TV_Spend_ads60              (TV with 60% adstock)
Digital_Display_ATAN_a12    (Display with saturation curve)
Radio_Q4_Only               (Radio spend, Q4 periods only)
Social_Media_Mix_WGTD       (Weighted combination)
OOH_AVO_85                  (OOH above 85% threshold)Bad Names:
TV_transformed
Variable1
temp_test
X_final_v2Benefits:
- Easy to understand at a glance 
- Clear what transformation was applied 
- Easier to document and share 
Include Transformation Details
Format:
{Base Variable}_{Transformation}_{Parameters}
Examples:
TV_Spend_ads60_ATAN_a15_p12
Digital_lag2
Price_Q4_OnlyWhat to Include:
- Base variable name 
- Transformation type (ads, ATAN, AVO, WGTD, etc.) 
- Key parameters (adstock rate, threshold, etc.) 
Transformation Best Practices
Adstock Transformations
✅ Do:
- Apply to all media channels (TV, Radio, Print, Display) 
- Test multiple rates (40%, 50%, 60%, 70%) 
- Use Variable Testing to find optimal rate 
- Document why specific rate chosen 
❌ Don't:
- Apply same rate to all channels (they decay differently) 
- Use adstock on non-media variables (price, weather) 
- Apply adstock AND lag (choose one) 
- Use rates > 90% (unrealistic persistence) 
Typical Rates:
- TV: 50-70% 
- Radio: 40-60% 
- Print: 60-80% 
- Digital Display: 30-50% 
- Search: 10-30% 
Saturation Curves
✅ Do:
- Apply to media channels with large spend variance 
- Use Curve Testing to find optimal parameters 
- Test both S-shape and concave curves 
- Apply AFTER adstock (adstock first, then saturation) 
❌ Don't:
- Apply to variables with limited range (little saturation to model) 
- Use overly aggressive parameters (creates flat line) 
- Combine with too many other transformations (over-complicating) 
When to Use:
- Media channels with wide spend range (3× difference min to max) 
- Channels where diminishing returns expected 
- When linear model shows unrealistic ROI at high spend 
Lead/Lag Transformations
✅ Do:
- Use for non-media variables (price, promotions, external factors) 
- Test multiple lag periods (1, 2, 3 weeks) 
- Choose lag with highest t-statistic 
- Document the delay hypothesis 
❌ Don't:
- Use for media channels (use adstock instead) 
- Create excessive lags (>4 weeks rarely needed) 
- Use both lag and lead for same variable 
Common Applications:
- Price_lag1 (price changes take time to affect behavior) 
- DirectMail_lag2 (2-week delivery + response time) 
- Competitor_Activity_lag1 (delayed competitive response) 
Split by Date
✅ Do:
- Align splits with real business events (campaigns, rebrand, market entry) 
- Create complementary splits (Period A + Period B = Total) 
- Ensure sufficient data in each split (15+ observations) 
- Document the reason for split 
❌ Don't:
- Split arbitrarily without business rationale 
- Create too many splits (> 3-4 per variable) 
- Split into very short periods (< 10 observations) 
Good Use Cases:
- Before/After major change (product launch, rebrand) 
- Campaign vs. baseline periods 
- Seasonal effectiveness (Q4 vs. non-Q4) 
Weighted Variables (WGTD)
✅ Do:
- Combine highly correlated channels (reduces VIF) 
- Start with OLS coefficients as weights 
- Adjust weights based on business knowledge 
- Document weight rationale 
❌ Don't:
- Combine unrelated channels 
- Use arbitrary weights without justification 
- Over-combine (lose actionable insights) 
Best Applications:
- Multiple digital channels (PPC, Meta, Display, LinkedIn) 
- Multiple TV campaigns running simultaneously 
- Regional media that should be consolidated 
AVO (Above Value Operator)
✅ Do:
- Test multiple thresholds (70, 80, 90) 
- Check distribution (% of 1s vs. 0s) 
- Use for campaign flight detection 
- Combine with continuous spend variable 
❌ Don't:
- Use extreme thresholds (too few or too many 1s) 
- Confuse with percentile (AVO 90 ≠ 90th percentile) 
- Use as only variable for that channel 
Typical Thresholds:
- AVO 80-90: Identify heavy campaign weeks 
- AVO 60-70: Moderate campaign activity 
- AVO 40-50: General activity indicator 
Variable Testing Strategy
Systematic Testing Process
Step 1: Hypothesis Define what you're testing and why
- "TV ads persist 4-6 weeks based on past studies" 
Step 2: Create Candidates Build multiple versions
- TV_ads40, TV_ads50, TV_ads60, TV_ads70 
Step 3: Test in Model Use Variable Testing page
- Compare t-statistics 
- Check coefficients make sense 
- Review model R² 
Step 4: Select Winner Choose best-performing version
- Highest t-stat (most significant) 
- Makes business sense 
- Improves model fit 
Step 5: Document Record decision rationale
- Why this transformation? 
- What did we test? 
- What did we find? 
Common Pitfalls to Avoid
Pitfall 1: Transformation Overload
Problem: Applying too many transformations to one variable
Example:
TV → adstock → saturation → standardization → lag → AVOResult: Impossible to interpret, overfitted
Fix: Maximum 2-3 transformations per variable (typically adstock + saturation)
Pitfall 2: Ignoring Business Logic
Problem: Purely statistical approach without business validation
Example: Model shows TV with negative coefficient because of confounding
Fix: Always validate results with business stakeholders
Pitfall 3: Not Testing Alternatives
Problem: Applying one transformation without testing alternatives
Example: Using 50% adstock without testing 40%, 60%, 70%
Fix: Always test multiple parameter values
Pitfall 4: Inconsistent Application
Problem: Applying transformations inconsistently
Example: TV with adstock, Radio without (when both are brand media)
Fix: Apply same logic to similar channel types
Pitfall 5: Creating Too Many Variables
Problem: Explosion of variables from transformations
Example: Starting with 20 variables, ending with 80 after transformations
Fix: Be selective, only create variables that improve model
Variable Management
Organization Strategy
Group by Type:
- Raw Variables: Original uploaded data 
- Time Transformations: Lags, leads, splits 
- Marketing Transformations: Adstock, saturation 
- Composite Variables: Weighted, multiplied 
- Indicators: AVO, dummies 
Naming Prefix: Consider consistent prefixes for easy filtering
raw_TV_Spend
trans_TV_ads60
comp_Digital_Mix_WGTD
ind_TV_AVO_90Version Control
Track Changes:
- Keep notes on why variables were created 
- Date of creation 
- Parameters used 
- Performance in models 
Excel Export: Export model with transformations documented for reproducibility
Decision Framework
Should I Create This Variable?
Ask:
1. Does it address a real business hypothesis? ✅ Yes → Proceed ❌ No → Reconsider
2. Will it improve model interpretability or fit? ✅ Yes → Proceed ❌ No → Skip
3. Can I clearly explain what it represents? ✅ Yes → Proceed ❌ No → Simplify first
4. Have I tested it properly? ✅ Yes → Proceed ❌ No → Test first
5. Does it make business sense? ✅ Yes → Use it ❌ No → Don't use it
Model Complexity vs. Interpretability
Finding the Balance
Simple Model:
- 10-15 variables 
- Minimal transformations 
- Easy to explain 
- May miss some effects 
Complex Model:
- 30+ variables 
- Many transformations 
- Hard to explain 
- May overfit 
Optimal Model:
- 15-25 variables 
- Purposeful transformations 
- Interpretable 
- Captures key effects 
Guideline: If you can't easily explain a variable to a stakeholder, it's probably too complex
Documentation Best Practices
What to Document
For Each Created Variable:
- Base variable(s) used 
- Transformation type and parameters 
- Business rationale 
- Date created 
- Performance (t-stat, significance) 
- Decision to keep or exclude 
Example Log:
Variable: TV_Spend_ads60_ATAN_a15_p12
Created: 2024-01-15
Base: TV_Spend
Transformations: 
  - Adstock 60% (tested 40%, 50%, 60%, 70% - 60% had highest t-stat)
  - ATAN saturation (alpha=15, power=1.2)
Rationale: TV shows strong persistence and diminishing returns
Performance: t-stat = 4.2, R² improvement = 0.03
Status: ACTIVE in Model_v2Quality Checklist
Before finalizing variables, verify:
Statistical Quality:
Business Quality:
Technical Quality:
Summary
Key Takeaways:
🎯 Start simple, add complexity gradually - don't over-engineer initially
📝 Document everything - rationale, parameters, decisions
🧪 Test before committing - verify transformations improve model
✅ Every variable needs a purpose - no "just because" transformations
📊 Name clearly - descriptive names with transformation details
🔍 Validate with business logic - statistics + domain knowledge
⚖️ Balance complexity vs. interpretability - aim for 15-25 final variables
🎓 Less is often more - 20 well-chosen variables beat 50 random ones
Bottom Line: Great variable engineering is both an art and science. Use statistical methods to test, business logic to guide, and common sense to validate. When in doubt, keep it simple!
Last updated