Interpreting Test Results

Understanding the Diagnostic Dashboard

When you run model diagnostics in MixModeler, results are displayed in an intuitive card-based interface. Each test shows a summary card with key metrics and a pass/fail indicator, allowing you to quickly assess model quality.

Reading Test Summary Cards

Each diagnostic test card displays:

Test Name and Description: Clear label explaining what the test measures

Status Indicator:

✓ Green checkmark: Test passed
⚠ Red warning: Test failed or issue detected

Key Statistic: Primary test statistic or metric prominently displayed

P-value (when applicable): Statistical significance level with star notation

* p < 0.1
** p < 0.05
*** p < 0.01

Quick Interpretation: One-line summary of what the result means

View Details Button: Click to see in-depth analysis, plots, and tables

Interpreting P-Values

P-values indicate the probability of observing your results if the null hypothesis were true:

P-value ≥ 0.05: No significant issue detected (✓ test passed)

Example: Normality test p = 0.23 → Residuals appear normal

P-value < 0.05: Significant issue detected (⚠ test failed)

Example: Heteroscedasticity p = 0.02 → Variance is not constant

P-value < 0.01: Highly significant issue (⚠⚠ serious concern)

Example: Autocorrelation p < 0.001 → Strong temporal dependence

Important: Lower p-values indicate stronger evidence of a problem. However, context and effect size matter more than p-values alone.

Test-Specific Interpretation Guide

Residual Normality

Passed (p ≥ 0.05):

Residuals are normally distributed
Confidence intervals and p-values are reliable
No action needed

Failed (p < 0.05):

Check skewness and kurtosis values
Review Q-Q plot for deviation pattern
Slight violations (p = 0.01-0.05) often acceptable
Severe violations (p < 0.001) require attention

Severity levels:

p = 0.01-0.05: Mild, often acceptable
p = 0.001-0.01: Moderate, investigate
p < 0.001: Severe, take action

Autocorrelation (Durbin-Watson)

Passed (DW = 1.5-2.5):

No significant autocorrelation
Residuals are independent
Standard errors are reliable

Failed (DW < 1.5 or > 2.5):

DW < 1.5: Positive autocorrelation (common)
DW > 2.5: Negative autocorrelation (rare)
Add lagged variables or time trends

Severity levels:

DW = 1.3-1.5 or 2.5-2.7: Mild
DW = 1.0-1.3 or 2.7-3.0: Moderate
DW < 1.0 or > 3.0: Severe

Heteroscedasticity

Passed (p ≥ 0.05):

Constant variance across fitted values
Standard errors are correct
Hypothesis tests are valid

Failed (p < 0.05):

Check residual vs fitted plot for patterns
Consider log transformation
Use robust standard errors

Severity levels:

p = 0.01-0.05: Mild
p = 0.001-0.01: Moderate
p < 0.001: Severe

Multicollinearity (VIF)

Passed (all VIF < 5):

Variables are sufficiently independent
Coefficients are stable
Each variable's effect is identifiable

Moderate (VIF 5-10):

Some correlation between predictors
Monitor affected variables
May be acceptable depending on purpose

Failed (VIF > 10):

Severe multicollinearity
Remove correlated variables
Combine into composite indices

Influential Points

Low Count (< 5% of observations):

Normal amount of variation
Model is robust
Investigate individual cases

Moderate Count (5-10%):

Several unusual periods
Check for data quality issues
Add event dummy variables

High Count (> 10%):

Serious data or model issues
Review data thoroughly
Reconsider model specification

Actual vs Predicted (R²)

Excellent (R² > 0.80):

Strong model fit
Highly reliable predictions
Ready for optimization

Good (R² = 0.70-0.80):

Acceptable fit for business use
Consider minor improvements
Suitable for most decisions

Moderate (R² = 0.50-0.70):

Captures main effects
Room for improvement
Use cautiously for optimization

Poor (R² < 0.50):

Weak model fit
Add variables or restructure
Not ready for business use

Comprehensive Model Assessment

Evaluate your model holistically across all diagnostics:

Excellent Model (Ready for Business Use)

Characteristics:

✓ All or most tests pass
R² > 0.75
Coefficients have expected signs
Low to moderate VIF
Few influential points

Action: Proceed with confidence to decomposition and optimization

Good Model (Suitable with Caveats)

Characteristics:

✓ Most critical tests pass
Minor violations on 1-2 tests
R² > 0.65
Coefficients mostly sensible

Action: Use for business decisions but acknowledge limitations in documentation

Needs Improvement

Characteristics:

⚠ Multiple test failures
R² = 0.50-0.65
Some unexpected coefficients
Moderate multicollinearity

Action: Make improvements before using for important decisions

Requires Significant Work

Characteristics:

⚠ Many serious failures
R² < 0.50
High VIF or many influential points
Unstable coefficients

Action: Rebuild model with better specification and data

Prioritizing Issues

Not all test failures are equally important. Prioritize based on:

Critical Issues (Fix Immediately):

R² < 0.50 (poor fit)
VIF > 10 (severe multicollinearity)
Many influential points (>10%)
Coefficients with wrong signs

Important Issues (Address Soon):

Moderate autocorrelation (DW < 1.3 or > 2.7)
Moderate multicollinearity (VIF 5-10)
R² = 0.50-0.60

Minor Issues (Monitor):

Slight non-normality (p = 0.02-0.05)
Mild heteroscedasticity
Few isolated outliers
DW = 1.3-1.5 or 2.5-2.7

Contextual Interpretation

Consider business context when interpreting results:

Data Frequency:

Weekly data: Lower R² acceptable (more noise)
Monthly data: Higher R² expected (less noise)
Quarterly data: Very high R² achievable

Market Characteristics:

Stable markets: Expect better fit
Dynamic markets: Accept more variation
New products: Lower R² normal

Business Objectives:

Strategic planning: Can tolerate moderate violations
Tactical optimization: Need strong statistical validity
Forecasting: Require excellent fit
General insights: Minor violations acceptable

Using Detail Views

Click "View Details" on any test card to access:

Statistical Tables: Complete test results with all metrics

Visualizations:

Q-Q plots (normality)
ACF plots (autocorrelation)
Scatter plots (heteroscedasticity, actual vs predicted)
Leverage plots (influential points)
Correlation matrices (multicollinearity)

Interpretation Guidance: Detailed explanations of what results mean

Action Recommendations: Specific steps to address issues

Communicating Results

When sharing diagnostic results with stakeholders:

For Technical Audiences:

Share complete diagnostic report with all p-values
Explain statistical assumptions and violations
Discuss technical remedies
Show detailed plots and tables

For Business Stakeholders:

Focus on R² and overall model quality
Use traffic light system (green/yellow/red)
Explain implications for decisions
Avoid statistical jargon

For Executives:

One-sentence summary: "Model is reliable/needs work"
Highlight confidence level in recommendations
Note key limitations or caveats
Focus on business impact

Example: Complete Model Interpretation

Test Results Summary:

Residual Normality: ✓ Passed (p = 0.18)
Autocorrelation: ⚠ DW = 1.42 (mild positive autocorrelation)
Heteroscedasticity: ✓ Passed (p = 0.31)
Multicollinearity: ⚠ TV VIF = 6.8, Display VIF = 7.2
Influential Points: ✓ 3 outliers (holiday weeks)
Actual vs Predicted: ✓ R² = 0.76

Overall Assessment:

Quality: Good model suitable for business use

Strengths:

Explains 76% of sales variation
No major statistical violations
Residuals well-behaved (normal, constant variance)
Few problematic observations

Limitations:

Mild autocorrelation suggests some temporal patterns not fully captured
Moderate multicollinearity between TV and Display (they run together)
Three holiday weeks drive some results

Recommendations:

Use model for strategic allocation decisions
Note that TV and Display effects may be partially confounded
Consider adding Q4 holiday dummy variable
Test adding lagged sales if forecasting is needed
Report results with 95% confidence intervals

Business Implications:

Channel ROI estimates are directionally reliable
Total marketing contribution is robust
Individual TV vs Display split has higher uncertainty due to multicollinearity
Budget optimization recommendations are sound

Common Interpretation Mistakes

Mistake 1: Focusing only on p-values

Effect size matters more than statistical significance
Business relevance trumps p < 0.05

Mistake 2: Expecting perfection

Real-world data always has some violations
No model perfectly satisfies all assumptions

Mistake 3: Ignoring practical significance

R² of 0.75 vs 0.78 is rarely meaningful in practice
Small VIF differences (4.5 vs 5.2) don't matter much

Mistake 4: Over-reacting to single test failures

One failed test doesn't invalidate the model
Consider the overall pattern across all diagnostics

Mistake 5: Under-reacting to critical issues

R² < 0.50 or VIF > 15 requires immediate action
Don't use models with severe problems

Diagnostic Flow Chart

Follow this decision tree for interpreting results:

Step 1: Check R²

R² > 0.70? → Continue to Step 2
R² < 0.70? → Add variables or improve specification

Step 2: Check VIF

All VIF < 10? → Continue to Step 3
Any VIF > 10? → Remove correlated variables

Step 3: Check Autocorrelation

DW between 1.5-2.5? → Continue to Step 4
DW outside range? → Add lagged variables

Step 4: Check Other Tests

Normality, heteroscedasticity acceptable? → Model ready
Multiple failures? → Review and improve

Step 5: Business Validation

Coefficients make business sense? → Proceed
Unexpected signs or magnitudes? → Investigate

Iterative Improvement Process

Model diagnostics guide an iterative refinement process:

Iteration 1: Initial Model

Build baseline with main variables
Run full diagnostics
Identify top 2-3 issues

Iteration 2: Address Critical Issues

Fix severe problems (VIF > 10, R² < 0.50)
Re-run diagnostics
Check if issues resolved

Iteration 3: Fine-Tuning

Address moderate issues
Test alternative specifications
Optimize variable transformations

Iteration 4: Validation

Verify all diagnostics acceptable
Check business validity
Prepare for production use

Note: Don't expect to fix everything in one iteration. Improvement is a gradual process.

Documentation Requirements

When you finalize a model, document:

Diagnostic Summary:

Which tests passed/failed
Key metrics (R², VIF, DW)
Actions taken to address issues

Remaining Limitations:

Known violations that weren't fixed
Why they're acceptable
Impact on interpretation

Sensitivity Analysis:

Results with/without influential points
Robustness to alternative specifications
Confidence intervals on key estimates

Business Validation:

Do coefficients make sense?
Do results align with market knowledge?
Have stakeholders reviewed?

Using PDF Reports

MixModeler generates downloadable PDF reports for each diagnostic test:

Report Contents:

Complete statistical test results
All relevant plots and visualizations
Interpretation guidance
Threshold values and benchmarks

When to Download:

Documenting model validation
Sharing with stakeholders
Compliance or audit requirements
Creating presentation materials

How to Use:

Click "View Details" on any test
Click "Download PDF Report" button
Save with descriptive filename
Include in project documentation

Next Steps After Diagnostics

Once you've interpreted diagnostic results:

Model Passed Most Tests: → Proceed to Decomposition Analysis → Calculate contribution by channel → Generate ROI metrics → Create business recommendations

Model Needs Minor Improvements: → Make adjustments in Model Builder → Re-run model with changes → Re-run diagnostics to verify → Proceed when satisfied

Model Requires Significant Work: → Return to Variable Engineering → Add missing transformations → Check Data Quality → Consider different model structure

Diagnostic Best Practices

Run diagnostics every time:

After fitting any new model
After modifying variables
After changing specifications
Before presenting results

Save diagnostic reports:

Keep PDF reports for documentation
Track how diagnostics change across iterations
Maintain audit trail

Involve stakeholders:

Share results in business terms
Explain what diagnostics mean for decisions
Build confidence in the model

Be transparent:

Acknowledge limitations
Don't hide test failures
Explain why violations are/aren't acceptable

Balance statistics and business judgment:

Perfect statistics don't guarantee business validity
Business knowledge validates statistical results
Use both together for best decisions

PreviousActual vs Predicted NextTroubleshooting Failed Tests

Last updated 26 days ago