the convergence of batch and streaming worlds #current22
Being able to write the same SQL without needing to code for time windows etc is more accessible and makes it feel much more like a regular database #current22
working with streaming data in dbt and snowflake. Streaming and batch nodes in the same lineage chart #current22
Who owns all of this? The analytics engineer.
🎯 Shouldn’t be talking about batch vs streaming, but what your company needs #Current22
Spicy take, but important one. *why* does it need to be real-time? What are they doing with that data? What is the business impact if we *don’t* have real-time #Current22
Sometimes though you *do* need real-time. @notamyfromdbt’s example was an airline that had a screen showing when to close the gate for a flight. A five minute SLA was ok and only what was possible with the tooling at the time - but real-time *would* have been better #Current22
Sometimes it’s not either/or - it’s both #Current22
The world is shifting. It should be about the use cases, not batch vs streaming. Analytics engineer is well placed to own this intersection. #current22
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Are we going to have batch and streaming forever, or will they converge? @esammer says at the heart of systems lambda arch will go away and kappa will eventually win out. Once in DW perhaps batch will remain for its familiarity to analytics engineers.
@notamyfromdbt - Microbatching gets used to simulate streaming but with same toolset for familiarity, but it doesn’t scale
Dan Sotolongo at #current22: RDBMS and SQL have stood the test of time. Sets the scene for stream processing by covering core concepts of tables and steams
#current22 handling event time joins in SQL using functions.
The next problem is making sure we have all the data. It’s watermarks, but not really
Having watched @gwenshap and @ozkatz100 talk about "git for data" I would definitely say is a serious idea.
However to the point at the end of the video, RTFM—it took reading docs.lakefs.io/using_lakefs/d… and some other pages subsequently to really grok the concept in practice.
Where I struggled at first with the git analogy alone was that data changes, and I couldn't see how branch/merge fitted into that outside of the idea of branching for throwaway testing alone. The 1PB accident was useful for illustrating the latter point for sure.
But then reading docs.lakefs.io/understand/roa… made me realise that I was thinking about the whole thing from a streaming PoV—when actually the idea of running a batch against a branch with a hook to validate and then merge is a freakin awesome idea