Interpreting Test Results
Understanding the Diagnostic Dashboard
When you run model diagnostics in MixModeler, results are displayed in an intuitive card-based interface. Each test shows a summary card with key metrics and a pass/fail indicator, allowing you to quickly assess model quality.
Reading Test Summary Cards
Each diagnostic test card displays:
Test Name and Description: Clear label explaining what the test measures
Status Indicator:
- ✓ Green checkmark: Test passed 
- ⚠ Red warning: Test failed or issue detected 
Key Statistic: Primary test statistic or metric prominently displayed
P-value (when applicable): Statistical significance level with star notation
- * p < 0.1 
- ** p < 0.05 
- *** p < 0.01 
Quick Interpretation: One-line summary of what the result means
View Details Button: Click to see in-depth analysis, plots, and tables
Interpreting P-Values
P-values indicate the probability of observing your results if the null hypothesis were true:
P-value ≥ 0.05: No significant issue detected (✓ test passed)
- Example: Normality test p = 0.23 → Residuals appear normal 
P-value < 0.05: Significant issue detected (⚠ test failed)
- Example: Heteroscedasticity p = 0.02 → Variance is not constant 
P-value < 0.01: Highly significant issue (⚠⚠ serious concern)
- Example: Autocorrelation p < 0.001 → Strong temporal dependence 
Important: Lower p-values indicate stronger evidence of a problem. However, context and effect size matter more than p-values alone.
Test-Specific Interpretation Guide
Residual Normality
Passed (p ≥ 0.05):
- Residuals are normally distributed 
- Confidence intervals and p-values are reliable 
- No action needed 
Failed (p < 0.05):
- Check skewness and kurtosis values 
- Review Q-Q plot for deviation pattern 
- Slight violations (p = 0.01-0.05) often acceptable 
- Severe violations (p < 0.001) require attention 
Severity levels:
- p = 0.01-0.05: Mild, often acceptable 
- p = 0.001-0.01: Moderate, investigate 
- p < 0.001: Severe, take action 
Autocorrelation (Durbin-Watson)
Passed (DW = 1.5-2.5):
- No significant autocorrelation 
- Residuals are independent 
- Standard errors are reliable 
Failed (DW < 1.5 or > 2.5):
- DW < 1.5: Positive autocorrelation (common) 
- DW > 2.5: Negative autocorrelation (rare) 
- Add lagged variables or time trends 
Severity levels:
- DW = 1.3-1.5 or 2.5-2.7: Mild 
- DW = 1.0-1.3 or 2.7-3.0: Moderate 
- DW < 1.0 or > 3.0: Severe 
Heteroscedasticity
Passed (p ≥ 0.05):
- Constant variance across fitted values 
- Standard errors are correct 
- Hypothesis tests are valid 
Failed (p < 0.05):
- Check residual vs fitted plot for patterns 
- Consider log transformation 
- Use robust standard errors 
Severity levels:
- p = 0.01-0.05: Mild 
- p = 0.001-0.01: Moderate 
- p < 0.001: Severe 
Multicollinearity (VIF)
Passed (all VIF < 5):
- Variables are sufficiently independent 
- Coefficients are stable 
- Each variable's effect is identifiable 
Moderate (VIF 5-10):
- Some correlation between predictors 
- Monitor affected variables 
- May be acceptable depending on purpose 
Failed (VIF > 10):
- Severe multicollinearity 
- Remove correlated variables 
- Combine into composite indices 
Influential Points
Low Count (< 5% of observations):
- Normal amount of variation 
- Model is robust 
- Investigate individual cases 
Moderate Count (5-10%):
- Several unusual periods 
- Check for data quality issues 
- Add event dummy variables 
High Count (> 10%):
- Serious data or model issues 
- Review data thoroughly 
- Reconsider model specification 
Actual vs Predicted (R²)
Excellent (R² > 0.80):
- Strong model fit 
- Highly reliable predictions 
- Ready for optimization 
Good (R² = 0.70-0.80):
- Acceptable fit for business use 
- Consider minor improvements 
- Suitable for most decisions 
Moderate (R² = 0.50-0.70):
- Captures main effects 
- Room for improvement 
- Use cautiously for optimization 
Poor (R² < 0.50):
- Weak model fit 
- Add variables or restructure 
- Not ready for business use 
Comprehensive Model Assessment
Evaluate your model holistically across all diagnostics:
Excellent Model (Ready for Business Use)
Characteristics:
- ✓ All or most tests pass 
- R² > 0.75 
- Coefficients have expected signs 
- Low to moderate VIF 
- Few influential points 
Action: Proceed with confidence to decomposition and optimization
Good Model (Suitable with Caveats)
Characteristics:
- ✓ Most critical tests pass 
- Minor violations on 1-2 tests 
- R² > 0.65 
- Coefficients mostly sensible 
Action: Use for business decisions but acknowledge limitations in documentation
Needs Improvement
Characteristics:
- ⚠ Multiple test failures 
- R² = 0.50-0.65 
- Some unexpected coefficients 
- Moderate multicollinearity 
Action: Make improvements before using for important decisions
Requires Significant Work
Characteristics:
- ⚠ Many serious failures 
- R² < 0.50 
- High VIF or many influential points 
- Unstable coefficients 
Action: Rebuild model with better specification and data
Prioritizing Issues
Not all test failures are equally important. Prioritize based on:
Critical Issues (Fix Immediately):
- R² < 0.50 (poor fit) 
- VIF > 10 (severe multicollinearity) 
- Many influential points (>10%) 
- Coefficients with wrong signs 
Important Issues (Address Soon):
- Moderate autocorrelation (DW < 1.3 or > 2.7) 
- Moderate multicollinearity (VIF 5-10) 
- R² = 0.50-0.60 
Minor Issues (Monitor):
- Slight non-normality (p = 0.02-0.05) 
- Mild heteroscedasticity 
- Few isolated outliers 
- DW = 1.3-1.5 or 2.5-2.7 
Contextual Interpretation
Consider business context when interpreting results:
Data Frequency:
- Weekly data: Lower R² acceptable (more noise) 
- Monthly data: Higher R² expected (less noise) 
- Quarterly data: Very high R² achievable 
Market Characteristics:
- Stable markets: Expect better fit 
- Dynamic markets: Accept more variation 
- New products: Lower R² normal 
Business Objectives:
- Strategic planning: Can tolerate moderate violations 
- Tactical optimization: Need strong statistical validity 
- Forecasting: Require excellent fit 
- General insights: Minor violations acceptable 
Using Detail Views
Click "View Details" on any test card to access:
Statistical Tables: Complete test results with all metrics
Visualizations:
- Q-Q plots (normality) 
- ACF plots (autocorrelation) 
- Scatter plots (heteroscedasticity, actual vs predicted) 
- Leverage plots (influential points) 
- Correlation matrices (multicollinearity) 
Interpretation Guidance: Detailed explanations of what results mean
Action Recommendations: Specific steps to address issues
Communicating Results
When sharing diagnostic results with stakeholders:
For Technical Audiences:
- Share complete diagnostic report with all p-values 
- Explain statistical assumptions and violations 
- Discuss technical remedies 
- Show detailed plots and tables 
For Business Stakeholders:
- Focus on R² and overall model quality 
- Use traffic light system (green/yellow/red) 
- Explain implications for decisions 
- Avoid statistical jargon 
For Executives:
- One-sentence summary: "Model is reliable/needs work" 
- Highlight confidence level in recommendations 
- Note key limitations or caveats 
- Focus on business impact 
Example: Complete Model Interpretation
Test Results Summary:
- Residual Normality: ✓ Passed (p = 0.18) 
- Autocorrelation: ⚠ DW = 1.42 (mild positive autocorrelation) 
- Heteroscedasticity: ✓ Passed (p = 0.31) 
- Multicollinearity: ⚠ TV VIF = 6.8, Display VIF = 7.2 
- Influential Points: ✓ 3 outliers (holiday weeks) 
- Actual vs Predicted: ✓ R² = 0.76 
Overall Assessment:
Quality: Good model suitable for business use
Strengths:
- Explains 76% of sales variation 
- No major statistical violations 
- Residuals well-behaved (normal, constant variance) 
- Few problematic observations 
Limitations:
- Mild autocorrelation suggests some temporal patterns not fully captured 
- Moderate multicollinearity between TV and Display (they run together) 
- Three holiday weeks drive some results 
Recommendations:
- Use model for strategic allocation decisions 
- Note that TV and Display effects may be partially confounded 
- Consider adding Q4 holiday dummy variable 
- Test adding lagged sales if forecasting is needed 
- Report results with 95% confidence intervals 
Business Implications:
- Channel ROI estimates are directionally reliable 
- Total marketing contribution is robust 
- Individual TV vs Display split has higher uncertainty due to multicollinearity 
- Budget optimization recommendations are sound 
Common Interpretation Mistakes
Mistake 1: Focusing only on p-values
- Effect size matters more than statistical significance 
- Business relevance trumps p < 0.05 
Mistake 2: Expecting perfection
- Real-world data always has some violations 
- No model perfectly satisfies all assumptions 
Mistake 3: Ignoring practical significance
- R² of 0.75 vs 0.78 is rarely meaningful in practice 
- Small VIF differences (4.5 vs 5.2) don't matter much 
Mistake 4: Over-reacting to single test failures
- One failed test doesn't invalidate the model 
- Consider the overall pattern across all diagnostics 
Mistake 5: Under-reacting to critical issues
- R² < 0.50 or VIF > 15 requires immediate action 
- Don't use models with severe problems 
Diagnostic Flow Chart
Follow this decision tree for interpreting results:
Step 1: Check R²
- R² > 0.70? → Continue to Step 2 
- R² < 0.70? → Add variables or improve specification 
Step 2: Check VIF
- All VIF < 10? → Continue to Step 3 
- Any VIF > 10? → Remove correlated variables 
Step 3: Check Autocorrelation
- DW between 1.5-2.5? → Continue to Step 4 
- DW outside range? → Add lagged variables 
Step 4: Check Other Tests
- Normality, heteroscedasticity acceptable? → Model ready 
- Multiple failures? → Review and improve 
Step 5: Business Validation
- Coefficients make business sense? → Proceed 
- Unexpected signs or magnitudes? → Investigate 
Iterative Improvement Process
Model diagnostics guide an iterative refinement process:
Iteration 1: Initial Model
- Build baseline with main variables 
- Run full diagnostics 
- Identify top 2-3 issues 
Iteration 2: Address Critical Issues
- Fix severe problems (VIF > 10, R² < 0.50) 
- Re-run diagnostics 
- Check if issues resolved 
Iteration 3: Fine-Tuning
- Address moderate issues 
- Test alternative specifications 
- Optimize variable transformations 
Iteration 4: Validation
- Verify all diagnostics acceptable 
- Check business validity 
- Prepare for production use 
Note: Don't expect to fix everything in one iteration. Improvement is a gradual process.
Documentation Requirements
When you finalize a model, document:
Diagnostic Summary:
- Which tests passed/failed 
- Key metrics (R², VIF, DW) 
- Actions taken to address issues 
Remaining Limitations:
- Known violations that weren't fixed 
- Why they're acceptable 
- Impact on interpretation 
Sensitivity Analysis:
- Results with/without influential points 
- Robustness to alternative specifications 
- Confidence intervals on key estimates 
Business Validation:
- Do coefficients make sense? 
- Do results align with market knowledge? 
- Have stakeholders reviewed? 
Using PDF Reports
MixModeler generates downloadable PDF reports for each diagnostic test:
Report Contents:
- Complete statistical test results 
- All relevant plots and visualizations 
- Interpretation guidance 
- Threshold values and benchmarks 
When to Download:
- Documenting model validation 
- Sharing with stakeholders 
- Compliance or audit requirements 
- Creating presentation materials 
How to Use:
- Click "View Details" on any test 
- Click "Download PDF Report" button 
- Save with descriptive filename 
- Include in project documentation 
Next Steps After Diagnostics
Once you've interpreted diagnostic results:
Model Passed Most Tests: → Proceed to Decomposition Analysis → Calculate contribution by channel → Generate ROI metrics → Create business recommendations
Model Needs Minor Improvements: → Make adjustments in Model Builder → Re-run model with changes → Re-run diagnostics to verify → Proceed when satisfied
Model Requires Significant Work: → Return to Variable Engineering → Add missing transformations → Check Data Quality → Consider different model structure
Diagnostic Best Practices
Run diagnostics every time:
- After fitting any new model 
- After modifying variables 
- After changing specifications 
- Before presenting results 
Save diagnostic reports:
- Keep PDF reports for documentation 
- Track how diagnostics change across iterations 
- Maintain audit trail 
Involve stakeholders:
- Share results in business terms 
- Explain what diagnostics mean for decisions 
- Build confidence in the model 
Be transparent:
- Acknowledge limitations 
- Don't hide test failures 
- Explain why violations are/aren't acceptable 
Balance statistics and business judgment:
- Perfect statistics don't guarantee business validity 
- Business knowledge validates statistical results 
- Use both together for best decisions 
Last updated