Data Scientist, MBA & MCIM | Founder @ https://t.co/EITR6hiRkM | Building Real-world Analytics Capability For The Next 1M+ Data leaders
Jun 24 โข 6 tweets โข 4 min read
How to read any new dataset in 15 minutes, before you write a single line of real analysis.
THREAD ๐๐๐๐๐
1. SHAPE AND GRAIN
Row count and column preview are not data validation.
They only tell you the table exists.
The real question is:
What does one row represent?
I once worked with a โcustomerโ table that looked clean until the grain was checked.
It was not one row per customer.
It was one row per customer per device.
That small difference changed everything:
โ Average order value was overstated
โ Repeat purchase rate was distorted
โ Churn logic became unreliable
โ Customer count looked larger than reality
โ Segmentation was built on duplicated behaviour
The table did not look broken.
That was the danger.
Check the grain before you trust the analysis:
SELECT customer_id, COUNT(*)
FROM table
GROUP BY customer_id
ORDER BY 2 DESC;
If this returns anything above 1 when you expect one row per customer, stop.
Do not build the dashboard.
Do not train the model.
Do not calculate churn.
Rebuild your assumptions first.
Because once the grain is wrong, every metric after it becomes confidently wrong.
Feb 5 โข 13 tweets โข 3 min read
Data Analyst/Pro: This thread breaks down the most-used SQL patterns in real organisations, with practical scenarios youโll actually face on the job ๐๐๐๐
Save this. Bookmark it. Reshare it.
Youโll come back to it when a metric breaks. 1/ JOINs (the #1 source of broken metrics)
JOINs arenโt about combining tables.
Theyโre about preserving truth.
For example:
- You join orders to order_items
- One order โ multiple items
- Revenue suddenly doubles after the JOIN
Nothing โbrokeโ.
Your logic did.
Senior analysts always ask:
โ What is the base table?
โ What happens to row counts after the JOIN?
If you canโt explain the multiplication effect, you canโt defend the number in a meeting.