Data Preview & Validation
Inspecting Your Data Before Modeling
After uploading your data to MixModeler, you'll see a comprehensive preview and validation interface. This page explains how to use these features to ensure your data is ready for modeling.
Data Preview Interface
Automatic Data Summary
What You'll See Immediately:
Dataset Overview:
Number of observations (rows)
Number of variables (columns)
Date range (first to last observation)
File name and upload timestamp
Example:
Dataset: MMM_Data_2024.xlsx
Observations: 104 weeks
Variables: 18
Date Range: 2022-01-01 to 2024-01-01Data Table Preview
Interactive Table Display:
First 20 rows of your data
All columns visible (scroll horizontally)
Column headers clearly labeled
Sortable by clicking column headers
Features:
Search: Find specific values or time periods
Filter: Show only certain date ranges
Export: Download preview as CSV (optional)
Variable Summary Statistics
For Each Variable, You'll See:
What to Check:
Type: Should be "Numeric" for all modeling variables
Missing: Should be 0 or very low
Min/Max: Check for impossible values (negative spend, extreme outliers)
Mean: Does it match your business knowledge?
Automatic Validations
Critical Validations (Must Pass)
✅ First Column Named "Observation"
Check: Is the first column exactly named Observation?
Pass:
Fail:
Fix: Rename first column to exactly Observation in Excel
✅ All Data Numeric
Check: Are all values in data columns numeric?
Pass:
Fail:
Fix: Remove symbols, convert to pure numbers
✅ No Duplicate Time Periods
Check: Are all observation values unique?
Pass:
Fail:
Fix: Remove or consolidate duplicate rows
Warning Validations (Should Review)
⚠️ Insufficient Data
Check: Do you have at least 52 observations?
Warning Threshold:
< 26 observations: Critical (model unlikely to work)
26-51 observations: Warning (limited statistical power)
52+ observations: Good
Recommendation: Gather more historical data if possible
⚠️ High Missing Data
Check: What percentage of values are missing per variable?
Thresholds:
0-5% missing: Acceptable
5-20% missing: Warning
20%+ missing: Critical issue
Fix: Fill with zeros or remove variable
⚠️ Zero Variance
Check: Does any variable have the same value for all observations?
Example Problem:
Impact: Variable cannot be modeled (no variation to explain KPI changes)
Fix: Exclude variable or gather data across periods with variation
⚠️ Extreme Outliers
Check: Are there values more than 3 standard deviations from the mean?
Visual Indicator: Flagged rows highlighted in preview table
Action Required:
Verify if outlier is real (Black Friday spike) or data error
If real: Keep it, possibly add dummy variable for that period
If error: Correct the value
Data Visualization Tools
Time Series Charts
Automatic Charts Generated:
KPI Over Time: Line chart showing your dependent variable across all periods
What to Look For:
Clear trend (upward, downward, stable)
Seasonality patterns
Sudden jumps or drops (validate these are real)
Marketing Variables Over Time: Individual line charts for each marketing channel
What to Look For:
Spending patterns make sense
Campaign flights visible
No unexplained gaps
Distribution Charts
Histograms for Each Variable: Shows frequency distribution of values
What to Look For:
Roughly bell-shaped (normal-ish)
Not heavily skewed (unless expected)
No suspicious gaps or modes
Correlation Matrix
Heatmap Showing: Correlation between all variables
What to Look For:
KPI positively correlated with marketing channels (expected)
Marketing channels not perfectly correlated (r < 0.9)
Identify potential multicollinearity issues
Color Coding:
🟢 Green: Positive correlation
🔴 Red: Negative correlation
Intensity: Strength of correlation
Interactive Validation Checklist
Pre-Modeling Checklist
MixModeler provides an interactive checklist:
Structure:
Data Quality:
Content:
Status Indicators:
✅ Green checkmark: Passed
⚠️ Yellow warning: Review recommended
❌ Red X: Must fix before modeling
Manual Review Tools
Data Filtering
Filter by Date Range: View specific time periods
Example:
Use Case: Verify holiday season data accuracy
Variable Selection
Toggle Columns: Show/hide specific variables in preview
Use Case: Focus on variables of interest, reduce clutter
Export Preview
Download Current View: Export filtered/sorted preview as CSV
Use Case: Share data snapshot with stakeholders for validation
Common Issues Found During Preview
Issue 1: Text in Numeric Columns
Symptom: Variable shows as "Text" type instead of "Numeric"
Example:
Fix:
Download original file
Remove symbols and text
Re-upload
Issue 2: Date Gaps
Symptom: Missing weeks or months in sequence
Example:
Fix: Add missing row with appropriate data (or zeros)
Issue 3: Incorrect Data Types
Symptom: Numbers stored as text (left-aligned in preview)
Fix: In Excel: Data → Text to Columns → Finish
Issue 4: Negative KPI Values
Symptom: Sales or revenue shows negative values
Possible Causes:
Returns/refunds included
Data entry error
Wrong column
Fix: Investigate and correct source data
Validation Report
Generate Validation Summary
After preview, you can generate:
Validation Report PDF:
All checks performed
Pass/Fail status
Warnings flagged
Summary statistics
Recommendations
Use Case: Document data quality for stakeholders or compliance
Next Steps After Validation
All Checks Pass ✅
Proceed to:
Variable Workshop (create transformations)
Model Builder (start building models)
Warnings Present ⚠️
Options:
Proceed with caution - acknowledge limitations
Fix issues - download, correct, re-upload
Consult documentation - understand impact of warnings
Critical Issues ❌
Must Fix Before Modeling:
Download original file
Address critical issues (naming, data types, duplicates)
Re-upload corrected file
Validate again
Best Practices
Always Review Before Modeling
Don't Skip Validation:
Spend 5 minutes reviewing preview
Check summary statistics match expectations
Verify time series charts make sense
Investigate any warnings or outliers
Why: Catching data issues early saves hours of troubleshooting later
Document Assumptions
Record Decisions:
Why outliers were kept/removed
How missing values were filled
Any data adjustments made
Where: Keep notes in separate document or Excel sheet
Iterative Validation
Process:
Upload → Review → Find issues
Fix → Re-upload → Review again
Repeat until all critical checks pass
Typical Iterations: 1-3 uploads to get data perfect
Summary
Key Takeaways:
📊 Preview shows complete data summary - observations, variables, statistics
✅ Automatic validations catch critical issues - naming, data types, duplicates
⚠️ Warnings guide improvement - insufficient data, missing values, outliers
📈 Visualizations reveal patterns - time series, distributions, correlations
🔍 Interactive tools enable deep inspection - filtering, sorting, searching
📋 Checklist ensures readiness - clear pass/fail indicators
🛠️ Fix issues before modeling - saves time and improves results
Bottom Line: Data preview and validation is your safety net. Spend time here to ensure your models are built on solid foundations!
Last updated