Data Preview & Validation

Inspecting Your Data Before Modeling

After uploading your data to MixModeler, you'll see a comprehensive preview and validation interface. This page explains how to use these features to ensure your data is ready for modeling.


Data Preview Interface

Automatic Data Summary

What You'll See Immediately:

Dataset Overview:

  • Number of observations (rows)

  • Number of variables (columns)

  • Date range (first to last observation)

  • File name and upload timestamp

Example:

Dataset: MMM_Data_2024.xlsx
Observations: 104 weeks
Variables: 18
Date Range: 2022-01-01 to 2024-01-01

Data Table Preview

Interactive Table Display:

  • First 20 rows of your data

  • All columns visible (scroll horizontally)

  • Column headers clearly labeled

  • Sortable by clicking column headers

Features:

  • Search: Find specific values or time periods

  • Filter: Show only certain date ranges

  • Export: Download preview as CSV (optional)


Variable Summary Statistics

For Each Variable, You'll See:

What to Check:

  • Type: Should be "Numeric" for all modeling variables

  • Missing: Should be 0 or very low

  • Min/Max: Check for impossible values (negative spend, extreme outliers)

  • Mean: Does it match your business knowledge?


Automatic Validations

Critical Validations (Must Pass)

✅ First Column Named "Observation"

Check: Is the first column exactly named Observation?

Pass:

Column A: Observation

Fail:

Column A: Date
Column A: observation  (wrong case)
Column A: Observations (plural)

Fix: Rename first column to exactly Observation in Excel


✅ All Data Numeric

Check: Are all values in data columns numeric?

Pass:

TV_Spend: 10000, 12000, 8000

Fail:

TV_Spend: $10,000, $12,000, $8,000  (currency symbols)
TV_Spend: 10k, 12k, 8k  (text)

Fix: Remove symbols, convert to pure numbers


✅ No Duplicate Time Periods

Check: Are all observation values unique?

Pass:

2023-01-01
2023-01-08
2023-01-15  (all unique)

Fail:

2023-01-01
2023-01-01  (duplicate!)
2023-01-08

Fix: Remove or consolidate duplicate rows


Warning Validations (Should Review)

⚠️ Insufficient Data

Check: Do you have at least 52 observations?

Warning Threshold:

  • < 26 observations: Critical (model unlikely to work)

  • 26-51 observations: Warning (limited statistical power)

  • 52+ observations: Good

Recommendation: Gather more historical data if possible


⚠️ High Missing Data

Check: What percentage of values are missing per variable?

Thresholds:

  • 0-5% missing: Acceptable

  • 5-20% missing: Warning

  • 20%+ missing: Critical issue

Fix: Fill with zeros or remove variable


⚠️ Zero Variance

Check: Does any variable have the same value for all observations?

Example Problem:

Price: 49.99, 49.99, 49.99, ... (never changes)

Impact: Variable cannot be modeled (no variation to explain KPI changes)

Fix: Exclude variable or gather data across periods with variation


⚠️ Extreme Outliers

Check: Are there values more than 3 standard deviations from the mean?

Visual Indicator: Flagged rows highlighted in preview table

Action Required:

  1. Verify if outlier is real (Black Friday spike) or data error

  2. If real: Keep it, possibly add dummy variable for that period

  3. If error: Correct the value


Data Visualization Tools

Time Series Charts

Automatic Charts Generated:

KPI Over Time: Line chart showing your dependent variable across all periods

What to Look For:

  • Clear trend (upward, downward, stable)

  • Seasonality patterns

  • Sudden jumps or drops (validate these are real)


Marketing Variables Over Time: Individual line charts for each marketing channel

What to Look For:

  • Spending patterns make sense

  • Campaign flights visible

  • No unexplained gaps


Distribution Charts

Histograms for Each Variable: Shows frequency distribution of values

What to Look For:

  • Roughly bell-shaped (normal-ish)

  • Not heavily skewed (unless expected)

  • No suspicious gaps or modes


Correlation Matrix

Heatmap Showing: Correlation between all variables

What to Look For:

  • KPI positively correlated with marketing channels (expected)

  • Marketing channels not perfectly correlated (r < 0.9)

  • Identify potential multicollinearity issues

Color Coding:

  • 🟢 Green: Positive correlation

  • 🔴 Red: Negative correlation

  • Intensity: Strength of correlation


Interactive Validation Checklist

Pre-Modeling Checklist

MixModeler provides an interactive checklist:

Structure:

Data Quality:

Content:

Status Indicators:

  • ✅ Green checkmark: Passed

  • ⚠️ Yellow warning: Review recommended

  • ❌ Red X: Must fix before modeling


Manual Review Tools

Data Filtering

Filter by Date Range: View specific time periods

Example:

Show: 2023-Q4 only

Use Case: Verify holiday season data accuracy


Variable Selection

Toggle Columns: Show/hide specific variables in preview

Use Case: Focus on variables of interest, reduce clutter


Export Preview

Download Current View: Export filtered/sorted preview as CSV

Use Case: Share data snapshot with stakeholders for validation


Common Issues Found During Preview

Issue 1: Text in Numeric Columns

Symptom: Variable shows as "Text" type instead of "Numeric"

Example:

TV_Spend: "10,000", "$12,000", "N/A"

Fix:

  1. Download original file

  2. Remove symbols and text

  3. Re-upload


Issue 2: Date Gaps

Symptom: Missing weeks or months in sequence

Example:

2023-01-01
2023-01-08
2023-01-22  ← Missing 2023-01-15

Fix: Add missing row with appropriate data (or zeros)


Issue 3: Incorrect Data Types

Symptom: Numbers stored as text (left-aligned in preview)

Fix: In Excel: Data → Text to Columns → Finish


Issue 4: Negative KPI Values

Symptom: Sales or revenue shows negative values

Possible Causes:

  • Returns/refunds included

  • Data entry error

  • Wrong column

Fix: Investigate and correct source data


Validation Report

Generate Validation Summary

After preview, you can generate:

Validation Report PDF:

  • All checks performed

  • Pass/Fail status

  • Warnings flagged

  • Summary statistics

  • Recommendations

Use Case: Document data quality for stakeholders or compliance


Next Steps After Validation

All Checks Pass ✅

Proceed to:

  1. Variable Workshop (create transformations)

  2. Model Builder (start building models)


Warnings Present ⚠️

Options:

  1. Proceed with caution - acknowledge limitations

  2. Fix issues - download, correct, re-upload

  3. Consult documentation - understand impact of warnings


Critical Issues ❌

Must Fix Before Modeling:

  1. Download original file

  2. Address critical issues (naming, data types, duplicates)

  3. Re-upload corrected file

  4. Validate again


Best Practices

Always Review Before Modeling

Don't Skip Validation:

  • Spend 5 minutes reviewing preview

  • Check summary statistics match expectations

  • Verify time series charts make sense

  • Investigate any warnings or outliers

Why: Catching data issues early saves hours of troubleshooting later


Document Assumptions

Record Decisions:

  • Why outliers were kept/removed

  • How missing values were filled

  • Any data adjustments made

Where: Keep notes in separate document or Excel sheet


Iterative Validation

Process:

  1. Upload → Review → Find issues

  2. Fix → Re-upload → Review again

  3. Repeat until all critical checks pass

Typical Iterations: 1-3 uploads to get data perfect


Summary

Key Takeaways:

📊 Preview shows complete data summary - observations, variables, statistics

Automatic validations catch critical issues - naming, data types, duplicates

⚠️ Warnings guide improvement - insufficient data, missing values, outliers

📈 Visualizations reveal patterns - time series, distributions, correlations

🔍 Interactive tools enable deep inspection - filtering, sorting, searching

📋 Checklist ensures readiness - clear pass/fail indicators

🛠️ Fix issues before modeling - saves time and improves results

Bottom Line: Data preview and validation is your safety net. Spend time here to ensure your models are built on solid foundations!

Last updated