Data Review & Cleaning Summary
ORDER_ID
not unique (1,091 duplicates)- Extreme merchant concentration: 87% have less than 10 orders
- 5,229 records with date logic violations.
Dataset Overview
- Orders dataset: 311,645 records
- Line items dataset: 411,581 records
Data Quality Issues Identified
Order ID Not Unique
- 1,091 orders representing separate legitimate transactions
- Different timestamps, addresses, costs - not data entry errors
- Solution: Use
DISTINCT
onorder_id
field for analysis while preserving underlying data
Date Logic Violations
- 5,229.00 orders with fulfilled date before order date
- 34 orders with registration issues
- Solution: Retain with appropriate filtering in analysis queries
Data Integrity
No Results
- Zero
null
values in primary keys (order_id
,merchant_id
,shop_id
,order_dt
) - 1 order without line items (ORDER_ID: 719886.143) - kept as non-impactful
- No orphaned line items
Merchants Distribution: Extreme Concentration
87% of merchants have less than 10 orders over 6 months. Classic marketplace power-user distribution requiring segmented analysis approach:
No Results