Actual vs Predicted
What Actual vs Predicted Analysis Shows
Actual vs Predicted analysis evaluates how well your model fits the data by comparing the observed values of your KPI with the values predicted by the model. This provides a comprehensive view of model performance and prediction accuracy.
Purpose: Evaluates overall model fit and prediction accuracy by comparing actual KPI values with model predictions.
Why Model Fit Matters
Good model fit indicates:
Reliable Predictions: The model accurately captures the relationship between marketing and outcomes
Complete Specification: Important drivers have been included
Correct Functional Form: Relationships are properly modeled (linear, saturation, adstock)
Trustworthy Attribution: Decomposition results will be credible
Poor fit suggests missing variables, wrong transformations, or fundamental model misspecification.
Key Performance Metrics
MixModeler calculates three primary metrics to assess fit:
R-Squared (R²)
Definition: Proportion of variance in the KPI explained by the model
Range: 0 to 1 (or 0% to 100%)
Interpretation:
> 0.80
Excellent
Model explains >80% of variation
0.70 - 0.80
Good
Acceptable for most business applications
0.50 - 0.70
Moderate
Room for improvement, use with caution
< 0.50
Poor
Significant work needed
Formula: R² = 1 - (SS_residual / SS_total)
Note: Higher R² is better, but very high R² (>0.95) may indicate overfitting
Adjusted R-Squared
Definition: R² adjusted for the number of predictors in the model
Purpose: Penalizes models for including unnecessary variables
Advantage: Better for comparing models with different numbers of variables
Relationship: Always ≤ R², with larger gap when many weak predictors included
Use: Prefer adjusted R² when comparing alternative model specifications
Root Mean Squared Error (RMSE)
Definition: Square root of the average squared prediction error
Formula: RMSE = √(Σ(actual - predicted)² / n)
Units: Same units as your KPI (e.g., dollars, units sold)
Interpretation:
Lower values indicate better fit
Measures typical prediction error
Compare across models (lower is better)
Context-dependent (RMSE of 1000 is good if KPI averages 100,000)
Relative RMSE: RMSE / mean(KPI) × 100 gives percentage error
Mean Absolute Error (MAE)
Definition: Average absolute prediction error
Formula: MAE = Σ|actual - predicted| / n
Units: Same units as your KPI
Interpretation:
Lower values indicate better fit
More robust to outliers than RMSE
Easier to interpret (average error magnitude)
Comparison with RMSE:
RMSE penalizes large errors more heavily
MAE treats all errors equally
If RMSE >> MAE, model has some large errors
Visual Diagnostics
MixModeler provides two key visualizations:
Scatter Plot (Actual vs Predicted)
X-axis: Predicted values Y-axis: Actual values Reference line: 45-degree diagonal (perfect predictions)
Good fit:
Points cluster tightly around diagonal line
No systematic deviation from line
Even spread across the range
Poor fit:
Points scattered far from diagonal
Systematic pattern (curved, clusters)
Wider spread at certain ranges
Time Series Plot
X-axis: Time (observation index or date) Y-axis: KPI values Two lines: Actual (solid) and Predicted (dashed)
Good fit:
Lines track closely throughout time period
Model captures peaks and troughs
No systematic over/under-prediction
Poor fit:
Large gaps between lines
Predicted line misses major movements
Consistent over or under-prediction
Interpreting Test Results
Strong Model Fit (✓)
Characteristics:
R² > 0.70
Low RMSE relative to KPI scale
Scatter plot points close to diagonal
Time series lines track well
Implications:
Model captures the key relationships
Predictions are reliable
Attribution is trustworthy
Can proceed with optimization
Action: Model is ready for business use
Moderate Model Fit (⚠)
Characteristics:
R² between 0.50-0.70
Moderate RMSE
Some scatter around diagonal
Time series generally tracks but with gaps
Implications:
Model captures main effects but misses some variation
Predictions are reasonable but not precise
Attribution gives directional insights
Consider improvements before optimization
Actions to improve:
Add missing variables
Test different transformations
Refine adstock parameters
Include interaction terms
Poor Model Fit (❌)
Characteristics:
R² < 0.50
High RMSE
Scatter plot widely dispersed
Time series lines diverge
Implications:
Model misspecified or missing critical variables
Predictions unreliable
Attribution results questionable
Not suitable for business decisions
Required actions:
Add important omitted variables
Reconsider model structure
Check data quality
Review business understanding
Common Patterns and Issues
Systematic Under-Prediction
Pattern: Predicted line consistently below actual
Possible causes:
Missing positive driver variables
Saturation curves too aggressive
Baseline estimate too low
Solution: Add variables, adjust curves, increase intercept
Systematic Over-Prediction
Pattern: Predicted line consistently above actual
Possible causes:
Including spurious variables
Double-counting effects
Baseline estimate too high
Solution: Remove weak variables, check for overlap, adjust baseline
Good Fit but Misses Peaks
Pattern: Model tracks trends but misses high/low extremes
Possible causes:
Missing promotional or event variables
Insufficient saturation flexibility
Outlier periods
Solution: Add event dummies, adjust curves, investigate extremes
Seasonal Patterns in Errors
Pattern: Residuals show cyclical patterns
Possible causes:
Missing seasonality variables
Autocorrelation issues
Quarterly or monthly effects not captured
Solution: Add seasonal dummies, check autocorrelation diagnostic
Practical Guidelines
Acceptable R² Benchmarks by Industry:
Retail/E-commerce: 0.65-0.85 (high variability, many factors)
CPG/Brand: 0.70-0.90 (stable markets, clear drivers)
B2B/Services: 0.60-0.80 (longer sales cycles, more noise)
These are guidelines - context matters more than arbitrary thresholds
When Lower R² is Acceptable:
Weekly data (more noise than monthly)
Many external factors beyond marketing control
New products or markets with limited history
Focus is on directional insights not precise prediction
When Higher R² is Expected:
Monthly or quarterly aggregation (smooths noise)
Stable mature markets
Strong marketing influence on KPI
Long time series with clear patterns
Example Interpretation
Scenario 1 - Excellent Fit:
R²: 0.82
Adjusted R²: 0.79
RMSE: 2,500 (KPI average: 50,000 = 5% error)
MAE: 1,800
Scatter plot: Tight clustering around diagonal
Interpretation: Excellent model fit. The model explains 82% of KPI variation with low prediction error. Time series shows model captures both trends and fluctuations. Ready for business use and optimization.
Scenario 2 - Good Fit:
R²: 0.73
Adjusted R²: 0.70
RMSE: 4,200 (KPI average: 50,000 = 8.4% error)
MAE: 3,100
Interpretation: Good model fit suitable for most business applications. The model captures major drivers but some variation remains unexplained. Consider adding seasonal variables or testing different saturation curves to improve further.
Scenario 3 - Needs Improvement:
R²: 0.48
Adjusted R²: 0.42
RMSE: 9,500 (KPI average: 50,000 = 19% error)
MAE: 7,200
Scatter plot: Wide dispersion
Interpretation: Poor model fit. Less than half of variation explained. Review model specification, add important variables, and check data quality before using for business decisions.
Relationship to Business Decisions
For Budget Allocation:
Need R² > 0.70 for confident optimization
Lower R² means more uncertainty in ROI estimates
For Forecasting:
RMSE indicates expected forecast error
Use confidence intervals based on RMSE
For Attribution:
Good fit ensures decomposition sums to actual KPI
Poor fit means attribution residuals are large
For ROI Calculation:
Coefficient accuracy depends on model fit
Low R² suggests ROI estimates are imprecise
Related Diagnostics
After reviewing Actual vs Predicted:
Check Residual Normality to see if errors are well-behaved
Review Autocorrelation if time series shows patterns
Examine Influential Points to see which periods fit poorly
Check R² in Model Builder for overall model statistics
Last updated