Many data engineers and CIOs tend to underestimate an ironic aspect of a dramatic increase in data volumes.
The larger the data volume gets, it makes more and more sense to process the data *more* frequently!
🧵
To see why, say that a business is creating a daily report based on its website traffic and this report took 2 hours to create.
If the website traffic grows by 4x, the report will take 8 hours to create. So, the tech people 4x the number of machines.
This is wrong-headed!
2/
Instead, consider an approach that makes the reports more timely:
* Compute statistics on 6 hours of data 4 times a day
* Aggregate these 6 hourly reports to create daily reports
* You can update your "daily" report four times a day.
* Data in report is only 6 hrs old!
3/
The computational cost of both these approaches is nearly the same, yet, the second approach:
* reduces latency
* increases frequency
* spreads out the load
* handles spikes better
Plus, you get more timely, less stale reports! This can have huge business value.
4/5
Extrapolate this approach, and it makes sense to have a constantly updating dashboard – just provide 24h aggregates that are up-to-the-minute.
As data volumes increase, many businesses have this conversation and change from batch data processing to stream analytics.
5/5
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Five months later, our ML patterns book is #3 in AI, behind only the top ML intro book and the top research one. Very grateful for the validation ... W/ @SRobTweets amazon.com/Machine-Learni…
Like most authors, we keep hitting F5 to read the reviews 😁 My favorites 🧵👇
"When I was learning C++, I found the Gang of Four book "Design Patterns" accomplished a similar goal to help bridge the gap between academic knowledge and practical software engineering. Much like with the GoF book I suspect I may be re-reading parts of this book in the future"
"must-read for scientists and practitioners looking to apply machine learning theory to real life problems. I foresee this book becoming a classical of the discipline’s literature."