Best Practices for Variable Creation

Guidelines for Effective Variable Engineering

Creating the right variables is crucial for building accurate and actionable MMM models. This page provides proven strategies and best practices for variable engineering in MixModeler.


Core Principles

1. Start Simple, Add Complexity Gradually

Initial Model:

  • Raw marketing variables (no transformations)

  • Basic seasonality (month dummies)

  • KPI as-is

Build Incrementally:

  • Add adstock to media channels

  • Apply saturation curves

  • Test interaction terms

  • Create composite variables

Why: Easier to understand what each transformation contributes, simpler debugging


2. Every Transformation Should Have a Purpose

Bad Practice: "Let me try every transformation and see what sticks"

Good Practice: "TV ads persist for weeks, so I'll apply adstock with 60% decay rate based on industry benchmarks"

Rule: Each transformation should address a specific business hypothesis or known marketing behavior


3. Test Before Committing

Before creating a variable:

  • Preview the transformation

  • Check the distribution (min, max, mean)

  • Visualize the effect (charts)

  • Understand what it represents

After creating:

  • Test in model (add to Model Builder)

  • Check t-statistic (is it significant?)

  • Verify coefficient sign (does it make sense?)

  • Compare R² with/without variable

If no improvement: Don't use the transformation


Naming Conventions

Use Clear, Descriptive Names

Good Names:

TV_Spend_ads60              (TV with 60% adstock)
Digital_Display_ATAN_a12    (Display with saturation curve)
Radio_Q4_Only               (Radio spend, Q4 periods only)
Social_Media_Mix_WGTD       (Weighted combination)
OOH_AVO_85                  (OOH above 85% threshold)

Bad Names:

TV_transformed
Variable1
temp_test
X_final_v2

Benefits:

  • Easy to understand at a glance

  • Clear what transformation was applied

  • Easier to document and share


Include Transformation Details

Format:

{Base Variable}_{Transformation}_{Parameters}

Examples:
TV_Spend_ads60_ATAN_a15_p12
Digital_lag2
Price_Q4_Only

What to Include:

  • Base variable name

  • Transformation type (ads, ATAN, AVO, WGTD, etc.)

  • Key parameters (adstock rate, threshold, etc.)


Transformation Best Practices

Adstock Transformations

✅ Do:

  • Apply to all media channels (TV, Radio, Print, Display)

  • Test multiple rates (40%, 50%, 60%, 70%)

  • Use Variable Testing to find optimal rate

  • Document why specific rate chosen

❌ Don't:

  • Apply same rate to all channels (they decay differently)

  • Use adstock on non-media variables (price, weather)

  • Apply adstock AND lag (choose one)

  • Use rates > 90% (unrealistic persistence)

Typical Rates:

  • TV: 50-70%

  • Radio: 40-60%

  • Print: 60-80%

  • Digital Display: 30-50%

  • Search: 10-30%


Saturation Curves

✅ Do:

  • Apply to media channels with large spend variance

  • Use Curve Testing to find optimal parameters

  • Test both S-shape and concave curves

  • Apply AFTER adstock (adstock first, then saturation)

❌ Don't:

  • Apply to variables with limited range (little saturation to model)

  • Use overly aggressive parameters (creates flat line)

  • Combine with too many other transformations (over-complicating)

When to Use:

  • Media channels with wide spend range (3× difference min to max)

  • Channels where diminishing returns expected

  • When linear model shows unrealistic ROI at high spend


Lead/Lag Transformations

✅ Do:

  • Use for non-media variables (price, promotions, external factors)

  • Test multiple lag periods (1, 2, 3 weeks)

  • Choose lag with highest t-statistic

  • Document the delay hypothesis

❌ Don't:

  • Use for media channels (use adstock instead)

  • Create excessive lags (>4 weeks rarely needed)

  • Use both lag and lead for same variable

Common Applications:

  • Price_lag1 (price changes take time to affect behavior)

  • DirectMail_lag2 (2-week delivery + response time)

  • Competitor_Activity_lag1 (delayed competitive response)


Split by Date

✅ Do:

  • Align splits with real business events (campaigns, rebrand, market entry)

  • Create complementary splits (Period A + Period B = Total)

  • Ensure sufficient data in each split (15+ observations)

  • Document the reason for split

❌ Don't:

  • Split arbitrarily without business rationale

  • Create too many splits (> 3-4 per variable)

  • Split into very short periods (< 10 observations)

Good Use Cases:

  • Before/After major change (product launch, rebrand)

  • Campaign vs. baseline periods

  • Seasonal effectiveness (Q4 vs. non-Q4)


Weighted Variables (WGTD)

✅ Do:

  • Combine highly correlated channels (reduces VIF)

  • Start with OLS coefficients as weights

  • Adjust weights based on business knowledge

  • Document weight rationale

❌ Don't:

  • Combine unrelated channels

  • Use arbitrary weights without justification

  • Over-combine (lose actionable insights)

Best Applications:

  • Multiple digital channels (PPC, Meta, Display, LinkedIn)

  • Multiple TV campaigns running simultaneously

  • Regional media that should be consolidated


AVO (Above Value Operator)

✅ Do:

  • Test multiple thresholds (70, 80, 90)

  • Check distribution (% of 1s vs. 0s)

  • Use for campaign flight detection

  • Combine with continuous spend variable

❌ Don't:

  • Use extreme thresholds (too few or too many 1s)

  • Confuse with percentile (AVO 90 ≠ 90th percentile)

  • Use as only variable for that channel

Typical Thresholds:

  • AVO 80-90: Identify heavy campaign weeks

  • AVO 60-70: Moderate campaign activity

  • AVO 40-50: General activity indicator


Variable Testing Strategy

Systematic Testing Process

Step 1: Hypothesis Define what you're testing and why

  • "TV ads persist 4-6 weeks based on past studies"

Step 2: Create Candidates Build multiple versions

  • TV_ads40, TV_ads50, TV_ads60, TV_ads70

Step 3: Test in Model Use Variable Testing page

  • Compare t-statistics

  • Check coefficients make sense

  • Review model R²

Step 4: Select Winner Choose best-performing version

  • Highest t-stat (most significant)

  • Makes business sense

  • Improves model fit

Step 5: Document Record decision rationale

  • Why this transformation?

  • What did we test?

  • What did we find?


Common Pitfalls to Avoid

Pitfall 1: Transformation Overload

Problem: Applying too many transformations to one variable

Example:

TV → adstock → saturation → standardization → lag → AVO

Result: Impossible to interpret, overfitted

Fix: Maximum 2-3 transformations per variable (typically adstock + saturation)


Pitfall 2: Ignoring Business Logic

Problem: Purely statistical approach without business validation

Example: Model shows TV with negative coefficient because of confounding

Fix: Always validate results with business stakeholders


Pitfall 3: Not Testing Alternatives

Problem: Applying one transformation without testing alternatives

Example: Using 50% adstock without testing 40%, 60%, 70%

Fix: Always test multiple parameter values


Pitfall 4: Inconsistent Application

Problem: Applying transformations inconsistently

Example: TV with adstock, Radio without (when both are brand media)

Fix: Apply same logic to similar channel types


Pitfall 5: Creating Too Many Variables

Problem: Explosion of variables from transformations

Example: Starting with 20 variables, ending with 80 after transformations

Fix: Be selective, only create variables that improve model


Variable Management

Organization Strategy

Group by Type:

  • Raw Variables: Original uploaded data

  • Time Transformations: Lags, leads, splits

  • Marketing Transformations: Adstock, saturation

  • Composite Variables: Weighted, multiplied

  • Indicators: AVO, dummies

Naming Prefix: Consider consistent prefixes for easy filtering

raw_TV_Spend
trans_TV_ads60
comp_Digital_Mix_WGTD
ind_TV_AVO_90

Version Control

Track Changes:

  • Keep notes on why variables were created

  • Date of creation

  • Parameters used

  • Performance in models

Excel Export: Export model with transformations documented for reproducibility


Decision Framework

Should I Create This Variable?

Ask:

1. Does it address a real business hypothesis? ✅ Yes → Proceed ❌ No → Reconsider

2. Will it improve model interpretability or fit? ✅ Yes → Proceed ❌ No → Skip

3. Can I clearly explain what it represents? ✅ Yes → Proceed ❌ No → Simplify first

4. Have I tested it properly? ✅ Yes → Proceed ❌ No → Test first

5. Does it make business sense? ✅ Yes → Use it ❌ No → Don't use it


Model Complexity vs. Interpretability

Finding the Balance

Simple Model:

  • 10-15 variables

  • Minimal transformations

  • Easy to explain

  • May miss some effects

Complex Model:

  • 30+ variables

  • Many transformations

  • Hard to explain

  • May overfit

Optimal Model:

  • 15-25 variables

  • Purposeful transformations

  • Interpretable

  • Captures key effects

Guideline: If you can't easily explain a variable to a stakeholder, it's probably too complex


Documentation Best Practices

What to Document

For Each Created Variable:

  • Base variable(s) used

  • Transformation type and parameters

  • Business rationale

  • Date created

  • Performance (t-stat, significance)

  • Decision to keep or exclude

Example Log:

Variable: TV_Spend_ads60_ATAN_a15_p12
Created: 2024-01-15
Base: TV_Spend
Transformations: 
  - Adstock 60% (tested 40%, 50%, 60%, 70% - 60% had highest t-stat)
  - ATAN saturation (alpha=15, power=1.2)
Rationale: TV shows strong persistence and diminishing returns
Performance: t-stat = 4.2, R² improvement = 0.03
Status: ACTIVE in Model_v2

Quality Checklist

Before finalizing variables, verify:

Statistical Quality:

Business Quality:

Technical Quality:


Summary

Key Takeaways:

🎯 Start simple, add complexity gradually - don't over-engineer initially

📝 Document everything - rationale, parameters, decisions

🧪 Test before committing - verify transformations improve model

Every variable needs a purpose - no "just because" transformations

📊 Name clearly - descriptive names with transformation details

🔍 Validate with business logic - statistics + domain knowledge

⚖️ Balance complexity vs. interpretability - aim for 15-25 final variables

🎓 Less is often more - 20 well-chosen variables beat 50 random ones

Bottom Line: Great variable engineering is both an art and science. Use statistical methods to test, business logic to guide, and common sense to validate. When in doubt, keep it simple!

Last updated