Interpreting Test Results
Understanding the Diagnostic Dashboard
When you run model diagnostics in MixModeler, results are displayed in an intuitive card-based interface. Each test shows a summary card with key metrics and a pass/fail indicator, allowing you to quickly assess model quality.
Reading Test Summary Cards
Each diagnostic test card displays:
Test Name and Description: Clear label explaining what the test measures
Status Indicator:
✓ Green checkmark: Test passed
⚠ Red warning: Test failed or issue detected
Key Statistic: Primary test statistic or metric prominently displayed
P-value (when applicable): Statistical significance level with star notation
* p < 0.1
** p < 0.05
*** p < 0.01
Quick Interpretation: One-line summary of what the result means
View Details Button: Click to see in-depth analysis, plots, and tables
Interpreting P-Values
P-values indicate the probability of observing your results if the null hypothesis were true:
P-value ≥ 0.05: No significant issue detected (✓ test passed)
Example: Normality test p = 0.23 → Residuals appear normal
P-value < 0.05: Significant issue detected (⚠ test failed)
Example: Heteroscedasticity p = 0.02 → Variance is not constant
P-value < 0.01: Highly significant issue (⚠⚠ serious concern)
Example: Autocorrelation p < 0.001 → Strong temporal dependence
Important: Lower p-values indicate stronger evidence of a problem. However, context and effect size matter more than p-values alone.
Test-Specific Interpretation Guide
Residual Normality
Passed (p ≥ 0.05):
Residuals are normally distributed
Confidence intervals and p-values are reliable
No action needed
Failed (p < 0.05):
Check skewness and kurtosis values
Review Q-Q plot for deviation pattern
Slight violations (p = 0.01-0.05) often acceptable
Severe violations (p < 0.001) require attention
Severity levels:
p = 0.01-0.05: Mild, often acceptable
p = 0.001-0.01: Moderate, investigate
p < 0.001: Severe, take action
Autocorrelation (Durbin-Watson)
Passed (DW = 1.5-2.5):
No significant autocorrelation
Residuals are independent
Standard errors are reliable
Failed (DW < 1.5 or > 2.5):
DW < 1.5: Positive autocorrelation (common)
DW > 2.5: Negative autocorrelation (rare)
Add lagged variables or time trends
Severity levels:
DW = 1.3-1.5 or 2.5-2.7: Mild
DW = 1.0-1.3 or 2.7-3.0: Moderate
DW < 1.0 or > 3.0: Severe
Heteroscedasticity
Passed (p ≥ 0.05):
Constant variance across fitted values
Standard errors are correct
Hypothesis tests are valid
Failed (p < 0.05):
Check residual vs fitted plot for patterns
Consider log transformation
Use robust standard errors
Severity levels:
p = 0.01-0.05: Mild
p = 0.001-0.01: Moderate
p < 0.001: Severe
Multicollinearity (VIF)
Passed (all VIF < 5):
Variables are sufficiently independent
Coefficients are stable
Each variable's effect is identifiable
Moderate (VIF 5-10):
Some correlation between predictors
Monitor affected variables
May be acceptable depending on purpose
Failed (VIF > 10):
Severe multicollinearity
Remove correlated variables
Combine into composite indices
Influential Points
Low Count (< 5% of observations):
Normal amount of variation
Model is robust
Investigate individual cases
Moderate Count (5-10%):
Several unusual periods
Check for data quality issues
Add event dummy variables
High Count (> 10%):
Serious data or model issues
Review data thoroughly
Reconsider model specification
Actual vs Predicted (R²)
Excellent (R² > 0.80):
Strong model fit
Highly reliable predictions
Ready for optimization
Good (R² = 0.70-0.80):
Acceptable fit for business use
Consider minor improvements
Suitable for most decisions
Moderate (R² = 0.50-0.70):
Captures main effects
Room for improvement
Use cautiously for optimization
Poor (R² < 0.50):
Weak model fit
Add variables or restructure
Not ready for business use
Comprehensive Model Assessment
Evaluate your model holistically across all diagnostics:
Excellent Model (Ready for Business Use)
Characteristics:
✓ All or most tests pass
R² > 0.75
Coefficients have expected signs
Low to moderate VIF
Few influential points
Action: Proceed with confidence to decomposition and optimization
Good Model (Suitable with Caveats)
Characteristics:
✓ Most critical tests pass
Minor violations on 1-2 tests
R² > 0.65
Coefficients mostly sensible
Action: Use for business decisions but acknowledge limitations in documentation
Needs Improvement
Characteristics:
⚠ Multiple test failures
R² = 0.50-0.65
Some unexpected coefficients
Moderate multicollinearity
Action: Make improvements before using for important decisions
Requires Significant Work
Characteristics:
⚠ Many serious failures
R² < 0.50
High VIF or many influential points
Unstable coefficients
Action: Rebuild model with better specification and data
Prioritizing Issues
Not all test failures are equally important. Prioritize based on:
Critical Issues (Fix Immediately):
R² < 0.50 (poor fit)
VIF > 10 (severe multicollinearity)
Many influential points (>10%)
Coefficients with wrong signs
Important Issues (Address Soon):
Moderate autocorrelation (DW < 1.3 or > 2.7)
Moderate multicollinearity (VIF 5-10)
R² = 0.50-0.60
Minor Issues (Monitor):
Slight non-normality (p = 0.02-0.05)
Mild heteroscedasticity
Few isolated outliers
DW = 1.3-1.5 or 2.5-2.7
Contextual Interpretation
Consider business context when interpreting results:
Data Frequency:
Weekly data: Lower R² acceptable (more noise)
Monthly data: Higher R² expected (less noise)
Quarterly data: Very high R² achievable
Market Characteristics:
Stable markets: Expect better fit
Dynamic markets: Accept more variation
New products: Lower R² normal
Business Objectives:
Strategic planning: Can tolerate moderate violations
Tactical optimization: Need strong statistical validity
Forecasting: Require excellent fit
General insights: Minor violations acceptable
Using Detail Views
Click "View Details" on any test card to access:
Statistical Tables: Complete test results with all metrics
Visualizations:
Q-Q plots (normality)
ACF plots (autocorrelation)
Scatter plots (heteroscedasticity, actual vs predicted)
Leverage plots (influential points)
Correlation matrices (multicollinearity)
Interpretation Guidance: Detailed explanations of what results mean
Action Recommendations: Specific steps to address issues
Communicating Results
When sharing diagnostic results with stakeholders:
For Technical Audiences:
Share complete diagnostic report with all p-values
Explain statistical assumptions and violations
Discuss technical remedies
Show detailed plots and tables
For Business Stakeholders:
Focus on R² and overall model quality
Use traffic light system (green/yellow/red)
Explain implications for decisions
Avoid statistical jargon
For Executives:
One-sentence summary: "Model is reliable/needs work"
Highlight confidence level in recommendations
Note key limitations or caveats
Focus on business impact
Example: Complete Model Interpretation
Test Results Summary:
Residual Normality: ✓ Passed (p = 0.18)
Autocorrelation: ⚠ DW = 1.42 (mild positive autocorrelation)
Heteroscedasticity: ✓ Passed (p = 0.31)
Multicollinearity: ⚠ TV VIF = 6.8, Display VIF = 7.2
Influential Points: ✓ 3 outliers (holiday weeks)
Actual vs Predicted: ✓ R² = 0.76
Overall Assessment:
Quality: Good model suitable for business use
Strengths:
Explains 76% of sales variation
No major statistical violations
Residuals well-behaved (normal, constant variance)
Few problematic observations
Limitations:
Mild autocorrelation suggests some temporal patterns not fully captured
Moderate multicollinearity between TV and Display (they run together)
Three holiday weeks drive some results
Recommendations:
Use model for strategic allocation decisions
Note that TV and Display effects may be partially confounded
Consider adding Q4 holiday dummy variable
Test adding lagged sales if forecasting is needed
Report results with 95% confidence intervals
Business Implications:
Channel ROI estimates are directionally reliable
Total marketing contribution is robust
Individual TV vs Display split has higher uncertainty due to multicollinearity
Budget optimization recommendations are sound
Common Interpretation Mistakes
Mistake 1: Focusing only on p-values
Effect size matters more than statistical significance
Business relevance trumps p < 0.05
Mistake 2: Expecting perfection
Real-world data always has some violations
No model perfectly satisfies all assumptions
Mistake 3: Ignoring practical significance
R² of 0.75 vs 0.78 is rarely meaningful in practice
Small VIF differences (4.5 vs 5.2) don't matter much
Mistake 4: Over-reacting to single test failures
One failed test doesn't invalidate the model
Consider the overall pattern across all diagnostics
Mistake 5: Under-reacting to critical issues
R² < 0.50 or VIF > 15 requires immediate action
Don't use models with severe problems
Diagnostic Flow Chart
Follow this decision tree for interpreting results:
Step 1: Check R²
R² > 0.70? → Continue to Step 2
R² < 0.70? → Add variables or improve specification
Step 2: Check VIF
All VIF < 10? → Continue to Step 3
Any VIF > 10? → Remove correlated variables
Step 3: Check Autocorrelation
DW between 1.5-2.5? → Continue to Step 4
DW outside range? → Add lagged variables
Step 4: Check Other Tests
Normality, heteroscedasticity acceptable? → Model ready
Multiple failures? → Review and improve
Step 5: Business Validation
Coefficients make business sense? → Proceed
Unexpected signs or magnitudes? → Investigate
Iterative Improvement Process
Model diagnostics guide an iterative refinement process:
Iteration 1: Initial Model
Build baseline with main variables
Run full diagnostics
Identify top 2-3 issues
Iteration 2: Address Critical Issues
Fix severe problems (VIF > 10, R² < 0.50)
Re-run diagnostics
Check if issues resolved
Iteration 3: Fine-Tuning
Address moderate issues
Test alternative specifications
Optimize variable transformations
Iteration 4: Validation
Verify all diagnostics acceptable
Check business validity
Prepare for production use
Note: Don't expect to fix everything in one iteration. Improvement is a gradual process.
Documentation Requirements
When you finalize a model, document:
Diagnostic Summary:
Which tests passed/failed
Key metrics (R², VIF, DW)
Actions taken to address issues
Remaining Limitations:
Known violations that weren't fixed
Why they're acceptable
Impact on interpretation
Sensitivity Analysis:
Results with/without influential points
Robustness to alternative specifications
Confidence intervals on key estimates
Business Validation:
Do coefficients make sense?
Do results align with market knowledge?
Have stakeholders reviewed?
Using PDF Reports
MixModeler generates downloadable PDF reports for each diagnostic test:
Report Contents:
Complete statistical test results
All relevant plots and visualizations
Interpretation guidance
Threshold values and benchmarks
When to Download:
Documenting model validation
Sharing with stakeholders
Compliance or audit requirements
Creating presentation materials
How to Use:
Click "View Details" on any test
Click "Download PDF Report" button
Save with descriptive filename
Include in project documentation
Next Steps After Diagnostics
Once you've interpreted diagnostic results:
Model Passed Most Tests: → Proceed to Decomposition Analysis → Calculate contribution by channel → Generate ROI metrics → Create business recommendations
Model Needs Minor Improvements: → Make adjustments in Model Builder → Re-run model with changes → Re-run diagnostics to verify → Proceed when satisfied
Model Requires Significant Work: → Return to Variable Engineering → Add missing transformations → Check Data Quality → Consider different model structure
Diagnostic Best Practices
Run diagnostics every time:
After fitting any new model
After modifying variables
After changing specifications
Before presenting results
Save diagnostic reports:
Keep PDF reports for documentation
Track how diagnostics change across iterations
Maintain audit trail
Involve stakeholders:
Share results in business terms
Explain what diagnostics mean for decisions
Build confidence in the model
Be transparent:
Acknowledge limitations
Don't hide test failures
Explain why violations are/aren't acceptable
Balance statistics and business judgment:
Perfect statistics don't guarantee business validity
Business knowledge validates statistical results
Use both together for best decisions
Last updated