Interpreting Test Results

Understanding the Diagnostic Dashboard

When you run model diagnostics in MixModeler, results are displayed in an intuitive card-based interface. Each test shows a summary card with key metrics and a pass/fail indicator, allowing you to quickly assess model quality.

Reading Test Summary Cards

Each diagnostic test card displays:

Test Name and Description: Clear label explaining what the test measures

Status Indicator:

  • ✓ Green checkmark: Test passed

  • ⚠ Red warning: Test failed or issue detected

Key Statistic: Primary test statistic or metric prominently displayed

P-value (when applicable): Statistical significance level with star notation

  • * p < 0.1

  • ** p < 0.05

  • *** p < 0.01

Quick Interpretation: One-line summary of what the result means

View Details Button: Click to see in-depth analysis, plots, and tables

Interpreting P-Values

P-values indicate the probability of observing your results if the null hypothesis were true:

P-value ≥ 0.05: No significant issue detected (✓ test passed)

  • Example: Normality test p = 0.23 → Residuals appear normal

P-value < 0.05: Significant issue detected (⚠ test failed)

  • Example: Heteroscedasticity p = 0.02 → Variance is not constant

P-value < 0.01: Highly significant issue (⚠⚠ serious concern)

  • Example: Autocorrelation p < 0.001 → Strong temporal dependence

Important: Lower p-values indicate stronger evidence of a problem. However, context and effect size matter more than p-values alone.

Test-Specific Interpretation Guide

Residual Normality

Passed (p ≥ 0.05):

  • Residuals are normally distributed

  • Confidence intervals and p-values are reliable

  • No action needed

Failed (p < 0.05):

  • Check skewness and kurtosis values

  • Review Q-Q plot for deviation pattern

  • Slight violations (p = 0.01-0.05) often acceptable

  • Severe violations (p < 0.001) require attention

Severity levels:

  • p = 0.01-0.05: Mild, often acceptable

  • p = 0.001-0.01: Moderate, investigate

  • p < 0.001: Severe, take action

Autocorrelation (Durbin-Watson)

Passed (DW = 1.5-2.5):

  • No significant autocorrelation

  • Residuals are independent

  • Standard errors are reliable

Failed (DW < 1.5 or > 2.5):

  • DW < 1.5: Positive autocorrelation (common)

  • DW > 2.5: Negative autocorrelation (rare)

  • Add lagged variables or time trends

Severity levels:

  • DW = 1.3-1.5 or 2.5-2.7: Mild

  • DW = 1.0-1.3 or 2.7-3.0: Moderate

  • DW < 1.0 or > 3.0: Severe

Heteroscedasticity

Passed (p ≥ 0.05):

  • Constant variance across fitted values

  • Standard errors are correct

  • Hypothesis tests are valid

Failed (p < 0.05):

  • Check residual vs fitted plot for patterns

  • Consider log transformation

  • Use robust standard errors

Severity levels:

  • p = 0.01-0.05: Mild

  • p = 0.001-0.01: Moderate

  • p < 0.001: Severe

Multicollinearity (VIF)

Passed (all VIF < 5):

  • Variables are sufficiently independent

  • Coefficients are stable

  • Each variable's effect is identifiable

Moderate (VIF 5-10):

  • Some correlation between predictors

  • Monitor affected variables

  • May be acceptable depending on purpose

Failed (VIF > 10):

  • Severe multicollinearity

  • Remove correlated variables

  • Combine into composite indices

Influential Points

Low Count (< 5% of observations):

  • Normal amount of variation

  • Model is robust

  • Investigate individual cases

Moderate Count (5-10%):

  • Several unusual periods

  • Check for data quality issues

  • Add event dummy variables

High Count (> 10%):

  • Serious data or model issues

  • Review data thoroughly

  • Reconsider model specification

Actual vs Predicted (R²)

Excellent (R² > 0.80):

  • Strong model fit

  • Highly reliable predictions

  • Ready for optimization

Good (R² = 0.70-0.80):

  • Acceptable fit for business use

  • Consider minor improvements

  • Suitable for most decisions

Moderate (R² = 0.50-0.70):

  • Captures main effects

  • Room for improvement

  • Use cautiously for optimization

Poor (R² < 0.50):

  • Weak model fit

  • Add variables or restructure

  • Not ready for business use

Comprehensive Model Assessment

Evaluate your model holistically across all diagnostics:

Excellent Model (Ready for Business Use)

Characteristics:

  • ✓ All or most tests pass

  • R² > 0.75

  • Coefficients have expected signs

  • Low to moderate VIF

  • Few influential points

Action: Proceed with confidence to decomposition and optimization

Good Model (Suitable with Caveats)

Characteristics:

  • ✓ Most critical tests pass

  • Minor violations on 1-2 tests

  • R² > 0.65

  • Coefficients mostly sensible

Action: Use for business decisions but acknowledge limitations in documentation

Needs Improvement

Characteristics:

  • ⚠ Multiple test failures

  • R² = 0.50-0.65

  • Some unexpected coefficients

  • Moderate multicollinearity

Action: Make improvements before using for important decisions

Requires Significant Work

Characteristics:

  • ⚠ Many serious failures

  • R² < 0.50

  • High VIF or many influential points

  • Unstable coefficients

Action: Rebuild model with better specification and data

Prioritizing Issues

Not all test failures are equally important. Prioritize based on:

Critical Issues (Fix Immediately):

  1. R² < 0.50 (poor fit)

  2. VIF > 10 (severe multicollinearity)

  3. Many influential points (>10%)

  4. Coefficients with wrong signs

Important Issues (Address Soon):

  1. Moderate autocorrelation (DW < 1.3 or > 2.7)

  2. Moderate multicollinearity (VIF 5-10)

  3. R² = 0.50-0.60

Minor Issues (Monitor):

  1. Slight non-normality (p = 0.02-0.05)

  2. Mild heteroscedasticity

  3. Few isolated outliers

  4. DW = 1.3-1.5 or 2.5-2.7

Contextual Interpretation

Consider business context when interpreting results:

Data Frequency:

  • Weekly data: Lower R² acceptable (more noise)

  • Monthly data: Higher R² expected (less noise)

  • Quarterly data: Very high R² achievable

Market Characteristics:

  • Stable markets: Expect better fit

  • Dynamic markets: Accept more variation

  • New products: Lower R² normal

Business Objectives:

  • Strategic planning: Can tolerate moderate violations

  • Tactical optimization: Need strong statistical validity

  • Forecasting: Require excellent fit

  • General insights: Minor violations acceptable

Using Detail Views

Click "View Details" on any test card to access:

Statistical Tables: Complete test results with all metrics

Visualizations:

  • Q-Q plots (normality)

  • ACF plots (autocorrelation)

  • Scatter plots (heteroscedasticity, actual vs predicted)

  • Leverage plots (influential points)

  • Correlation matrices (multicollinearity)

Interpretation Guidance: Detailed explanations of what results mean

Action Recommendations: Specific steps to address issues

Communicating Results

When sharing diagnostic results with stakeholders:

For Technical Audiences:

  • Share complete diagnostic report with all p-values

  • Explain statistical assumptions and violations

  • Discuss technical remedies

  • Show detailed plots and tables

For Business Stakeholders:

  • Focus on R² and overall model quality

  • Use traffic light system (green/yellow/red)

  • Explain implications for decisions

  • Avoid statistical jargon

For Executives:

  • One-sentence summary: "Model is reliable/needs work"

  • Highlight confidence level in recommendations

  • Note key limitations or caveats

  • Focus on business impact

Example: Complete Model Interpretation

Test Results Summary:

  • Residual Normality: ✓ Passed (p = 0.18)

  • Autocorrelation: ⚠ DW = 1.42 (mild positive autocorrelation)

  • Heteroscedasticity: ✓ Passed (p = 0.31)

  • Multicollinearity: ⚠ TV VIF = 6.8, Display VIF = 7.2

  • Influential Points: ✓ 3 outliers (holiday weeks)

  • Actual vs Predicted: ✓ R² = 0.76

Overall Assessment:

Quality: Good model suitable for business use

Strengths:

  • Explains 76% of sales variation

  • No major statistical violations

  • Residuals well-behaved (normal, constant variance)

  • Few problematic observations

Limitations:

  • Mild autocorrelation suggests some temporal patterns not fully captured

  • Moderate multicollinearity between TV and Display (they run together)

  • Three holiday weeks drive some results

Recommendations:

  • Use model for strategic allocation decisions

  • Note that TV and Display effects may be partially confounded

  • Consider adding Q4 holiday dummy variable

  • Test adding lagged sales if forecasting is needed

  • Report results with 95% confidence intervals

Business Implications:

  • Channel ROI estimates are directionally reliable

  • Total marketing contribution is robust

  • Individual TV vs Display split has higher uncertainty due to multicollinearity

  • Budget optimization recommendations are sound

Common Interpretation Mistakes

Mistake 1: Focusing only on p-values

  • Effect size matters more than statistical significance

  • Business relevance trumps p < 0.05

Mistake 2: Expecting perfection

  • Real-world data always has some violations

  • No model perfectly satisfies all assumptions

Mistake 3: Ignoring practical significance

  • R² of 0.75 vs 0.78 is rarely meaningful in practice

  • Small VIF differences (4.5 vs 5.2) don't matter much

Mistake 4: Over-reacting to single test failures

  • One failed test doesn't invalidate the model

  • Consider the overall pattern across all diagnostics

Mistake 5: Under-reacting to critical issues

  • R² < 0.50 or VIF > 15 requires immediate action

  • Don't use models with severe problems

Diagnostic Flow Chart

Follow this decision tree for interpreting results:

Step 1: Check R²

  • R² > 0.70? → Continue to Step 2

  • R² < 0.70? → Add variables or improve specification

Step 2: Check VIF

  • All VIF < 10? → Continue to Step 3

  • Any VIF > 10? → Remove correlated variables

Step 3: Check Autocorrelation

  • DW between 1.5-2.5? → Continue to Step 4

  • DW outside range? → Add lagged variables

Step 4: Check Other Tests

  • Normality, heteroscedasticity acceptable? → Model ready

  • Multiple failures? → Review and improve

Step 5: Business Validation

  • Coefficients make business sense? → Proceed

  • Unexpected signs or magnitudes? → Investigate

Iterative Improvement Process

Model diagnostics guide an iterative refinement process:

Iteration 1: Initial Model

  • Build baseline with main variables

  • Run full diagnostics

  • Identify top 2-3 issues

Iteration 2: Address Critical Issues

  • Fix severe problems (VIF > 10, R² < 0.50)

  • Re-run diagnostics

  • Check if issues resolved

Iteration 3: Fine-Tuning

  • Address moderate issues

  • Test alternative specifications

  • Optimize variable transformations

Iteration 4: Validation

  • Verify all diagnostics acceptable

  • Check business validity

  • Prepare for production use

Note: Don't expect to fix everything in one iteration. Improvement is a gradual process.

Documentation Requirements

When you finalize a model, document:

Diagnostic Summary:

  • Which tests passed/failed

  • Key metrics (R², VIF, DW)

  • Actions taken to address issues

Remaining Limitations:

  • Known violations that weren't fixed

  • Why they're acceptable

  • Impact on interpretation

Sensitivity Analysis:

  • Results with/without influential points

  • Robustness to alternative specifications

  • Confidence intervals on key estimates

Business Validation:

  • Do coefficients make sense?

  • Do results align with market knowledge?

  • Have stakeholders reviewed?

Using PDF Reports

MixModeler generates downloadable PDF reports for each diagnostic test:

Report Contents:

  • Complete statistical test results

  • All relevant plots and visualizations

  • Interpretation guidance

  • Threshold values and benchmarks

When to Download:

  • Documenting model validation

  • Sharing with stakeholders

  • Compliance or audit requirements

  • Creating presentation materials

How to Use:

  • Click "View Details" on any test

  • Click "Download PDF Report" button

  • Save with descriptive filename

  • Include in project documentation

Next Steps After Diagnostics

Once you've interpreted diagnostic results:

Model Passed Most Tests: → Proceed to Decomposition Analysis → Calculate contribution by channel → Generate ROI metrics → Create business recommendations

Model Needs Minor Improvements: → Make adjustments in Model Builder → Re-run model with changes → Re-run diagnostics to verify → Proceed when satisfied

Model Requires Significant Work: → Return to Variable Engineering → Add missing transformations → Check Data Quality → Consider different model structure

Diagnostic Best Practices

Run diagnostics every time:

  • After fitting any new model

  • After modifying variables

  • After changing specifications

  • Before presenting results

Save diagnostic reports:

  • Keep PDF reports for documentation

  • Track how diagnostics change across iterations

  • Maintain audit trail

Involve stakeholders:

  • Share results in business terms

  • Explain what diagnostics mean for decisions

  • Build confidence in the model

Be transparent:

  • Acknowledge limitations

  • Don't hide test failures

  • Explain why violations are/aren't acceptable

Balance statistics and business judgment:

  • Perfect statistics don't guarantee business validity

  • Business knowledge validates statistical results

  • Use both together for best decisions

Last updated