Data Preview & Validation
Inspecting Your Data Before Modeling
After uploading your data to MixModeler, you'll see a comprehensive preview and validation interface. This page explains how to use these features to ensure your data is ready for modeling.
Data Preview Interface
Automatic Data Summary
What You'll See Immediately:
Dataset Overview:
- Number of observations (rows) 
- Number of variables (columns) 
- Date range (first to last observation) 
- File name and upload timestamp 
Example:
Dataset: MMM_Data_2024.xlsx
Observations: 104 weeks
Variables: 18
Date Range: 2022-01-01 to 2024-01-01Data Table Preview
Interactive Table Display:
- First 20 rows of your data 
- All columns visible (scroll horizontally) 
- Column headers clearly labeled 
- Sortable by clicking column headers 
Features:
- Search: Find specific values or time periods 
- Filter: Show only certain date ranges 
- Export: Download preview as CSV (optional) 
Variable Summary Statistics
For Each Variable, You'll See:
What to Check:
- Type: Should be "Numeric" for all modeling variables 
- Missing: Should be 0 or very low 
- Min/Max: Check for impossible values (negative spend, extreme outliers) 
- Mean: Does it match your business knowledge? 
Automatic Validations
Critical Validations (Must Pass)
✅ First Column Named "Observation"
Check: Is the first column exactly named Observation?
Pass:
Column A: ObservationFail:
Column A: Date
Column A: observation  (wrong case)
Column A: Observations (plural)Fix: Rename first column to exactly Observation in Excel
✅ All Data Numeric
Check: Are all values in data columns numeric?
Pass:
TV_Spend: 10000, 12000, 8000Fail:
TV_Spend: $10,000, $12,000, $8,000  (currency symbols)
TV_Spend: 10k, 12k, 8k  (text)Fix: Remove symbols, convert to pure numbers
✅ No Duplicate Time Periods
Check: Are all observation values unique?
Pass:
2023-01-01
2023-01-08
2023-01-15  (all unique)Fail:
2023-01-01
2023-01-01  (duplicate!)
2023-01-08Fix: Remove or consolidate duplicate rows
Warning Validations (Should Review)
⚠️ Insufficient Data
Check: Do you have at least 52 observations?
Warning Threshold:
- < 26 observations: Critical (model unlikely to work) 
- 26-51 observations: Warning (limited statistical power) 
- 52+ observations: Good 
Recommendation: Gather more historical data if possible
⚠️ High Missing Data
Check: What percentage of values are missing per variable?
Thresholds:
- 0-5% missing: Acceptable 
- 5-20% missing: Warning 
- 20%+ missing: Critical issue 
Fix: Fill with zeros or remove variable
⚠️ Zero Variance
Check: Does any variable have the same value for all observations?
Example Problem:
Price: 49.99, 49.99, 49.99, ... (never changes)Impact: Variable cannot be modeled (no variation to explain KPI changes)
Fix: Exclude variable or gather data across periods with variation
⚠️ Extreme Outliers
Check: Are there values more than 3 standard deviations from the mean?
Visual Indicator: Flagged rows highlighted in preview table
Action Required:
- Verify if outlier is real (Black Friday spike) or data error 
- If real: Keep it, possibly add dummy variable for that period 
- If error: Correct the value 
Data Visualization Tools
Time Series Charts
Automatic Charts Generated:
KPI Over Time: Line chart showing your dependent variable across all periods
What to Look For:
- Clear trend (upward, downward, stable) 
- Seasonality patterns 
- Sudden jumps or drops (validate these are real) 
Marketing Variables Over Time: Individual line charts for each marketing channel
What to Look For:
- Spending patterns make sense 
- Campaign flights visible 
- No unexplained gaps 
Distribution Charts
Histograms for Each Variable: Shows frequency distribution of values
What to Look For:
- Roughly bell-shaped (normal-ish) 
- Not heavily skewed (unless expected) 
- No suspicious gaps or modes 
Correlation Matrix
Heatmap Showing: Correlation between all variables
What to Look For:
- KPI positively correlated with marketing channels (expected) 
- Marketing channels not perfectly correlated (r < 0.9) 
- Identify potential multicollinearity issues 
Color Coding:
- 🟢 Green: Positive correlation 
- 🔴 Red: Negative correlation 
- Intensity: Strength of correlation 
Interactive Validation Checklist
Pre-Modeling Checklist
MixModeler provides an interactive checklist:
Structure:
Data Quality:
Content:
Status Indicators:
- ✅ Green checkmark: Passed 
- ⚠️ Yellow warning: Review recommended 
- ❌ Red X: Must fix before modeling 
Manual Review Tools
Data Filtering
Filter by Date Range: View specific time periods
Example:
Show: 2023-Q4 onlyUse Case: Verify holiday season data accuracy
Variable Selection
Toggle Columns: Show/hide specific variables in preview
Use Case: Focus on variables of interest, reduce clutter
Export Preview
Download Current View: Export filtered/sorted preview as CSV
Use Case: Share data snapshot with stakeholders for validation
Common Issues Found During Preview
Issue 1: Text in Numeric Columns
Symptom: Variable shows as "Text" type instead of "Numeric"
Example:
TV_Spend: "10,000", "$12,000", "N/A"Fix:
- Download original file 
- Remove symbols and text 
- Re-upload 
Issue 2: Date Gaps
Symptom: Missing weeks or months in sequence
Example:
2023-01-01
2023-01-08
2023-01-22  ← Missing 2023-01-15Fix: Add missing row with appropriate data (or zeros)
Issue 3: Incorrect Data Types
Symptom: Numbers stored as text (left-aligned in preview)
Fix: In Excel: Data → Text to Columns → Finish
Issue 4: Negative KPI Values
Symptom: Sales or revenue shows negative values
Possible Causes:
- Returns/refunds included 
- Data entry error 
- Wrong column 
Fix: Investigate and correct source data
Validation Report
Generate Validation Summary
After preview, you can generate:
Validation Report PDF:
- All checks performed 
- Pass/Fail status 
- Warnings flagged 
- Summary statistics 
- Recommendations 
Use Case: Document data quality for stakeholders or compliance
Next Steps After Validation
All Checks Pass ✅
Proceed to:
- Variable Workshop (create transformations) 
- Model Builder (start building models) 
Warnings Present ⚠️
Options:
- Proceed with caution - acknowledge limitations 
- Fix issues - download, correct, re-upload 
- Consult documentation - understand impact of warnings 
Critical Issues ❌
Must Fix Before Modeling:
- Download original file 
- Address critical issues (naming, data types, duplicates) 
- Re-upload corrected file 
- Validate again 
Best Practices
Always Review Before Modeling
Don't Skip Validation:
- Spend 5 minutes reviewing preview 
- Check summary statistics match expectations 
- Verify time series charts make sense 
- Investigate any warnings or outliers 
Why: Catching data issues early saves hours of troubleshooting later
Document Assumptions
Record Decisions:
- Why outliers were kept/removed 
- How missing values were filled 
- Any data adjustments made 
Where: Keep notes in separate document or Excel sheet
Iterative Validation
Process:
- Upload → Review → Find issues 
- Fix → Re-upload → Review again 
- Repeat until all critical checks pass 
Typical Iterations: 1-3 uploads to get data perfect
Summary
Key Takeaways:
📊 Preview shows complete data summary - observations, variables, statistics
✅ Automatic validations catch critical issues - naming, data types, duplicates
⚠️ Warnings guide improvement - insufficient data, missing values, outliers
📈 Visualizations reveal patterns - time series, distributions, correlations
🔍 Interactive tools enable deep inspection - filtering, sorting, searching
📋 Checklist ensures readiness - clear pass/fail indicators
🛠️ Fix issues before modeling - saves time and improves results
Bottom Line: Data preview and validation is your safety net. Spend time here to ensure your models are built on solid foundations!
Last updated