Many data engineers and CIOs tend to underestimate an ironic aspect of a dramatic increase in data volumes.

The larger the data volume gets, it makes more and more sense to process the data *more* frequently!
🧵
To see why, say that a business is creating a daily report based on its website traffic and this report took 2 hours to create.

If the website traffic grows by 4x, the report will take 8 hours to create. So, the tech people 4x the number of machines.

This is wrong-headed!

2/
Instead, consider an approach that makes the reports more timely:

* Compute statistics on 6 hours of data 4 times a day
* Aggregate these 6 hourly reports to create daily reports
* You can update your "daily" report four times a day.
* Data in report is only 6 hrs old!

3/
The computational cost of both these approaches is nearly the same, yet, the second approach:
* reduces latency
* increases frequency
* spreads out the load
* handles spikes better

Plus, you get more timely, less stale reports! This can have huge business value.

4/5
Extrapolate this approach, and it makes sense to have a constantly updating dashboard – just provide 24h aggregates that are up-to-the-minute.

As data volumes increase, many businesses have this conversation and change from batch data processing to stream analytics.

5/5

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Lαк Lαкѕнмαηαη

Lαк Lαкѕнмαηαη Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @lak_gcp

28 Mar
Five months later, our ML patterns book is #3 in AI, behind only the top ML intro book and the top research one. Very grateful for the validation ... W/ @SRobTweets
amazon.com/Machine-Learni…
Like most authors, we keep hitting F5 to read the reviews 😁 My favorites 🧵👇
"When I was learning C++, I found the Gang of Four book "Design Patterns" accomplished a similar goal to help bridge the gap between academic knowledge and practical software engineering. Much like with the GoF book I suspect I may be re-reading parts of this book in the future"
"must-read for scientists and practitioners looking to apply machine learning theory to real life problems. I foresee this book becoming a classical of the discipline’s literature."
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(