Faster internet with cable, faster web browsers with chrome, faster video editing with TikTok - all things people were already doing, but faster. Enabled new use cases that people hadn’t thought of. #current22
Because it’s easy to get data in from asana etc, Mode build reports against it. If it wasn’t easy, they wouldn’t. #current22
The elephant in the room at a data streaming conference… where is the streaming in this? #current22
There just aren’t a lot of use cases for streaming. We’re not Uber, Google, amex, etc. We’re the 99%
Echoes of Strata at #current22 with its agenda based on technology for technology’s sake
Let’s make streaming boring. Forget the fancy use cases. Let’s just make it easy. #Current22
@bennstancil is hitting the nail on the head here. Streaming will get adoption when it’s easy and boring and let’s us do what we’re doing already but just faster. It’s not about the technology. #current22
Are we going to have batch and streaming forever, or will they converge? @esammer says at the heart of systems lambda arch will go away and kappa will eventually win out. Once in DW perhaps batch will remain for its familiarity to analytics engineers.
@notamyfromdbt - Microbatching gets used to simulate streaming but with same toolset for familiarity, but it doesn’t scale
Dan Sotolongo at #current22: RDBMS and SQL have stood the test of time. Sets the scene for stream processing by covering core concepts of tables and steams
#current22 handling event time joins in SQL using functions.
The next problem is making sure we have all the data. It’s watermarks, but not really
Having watched @gwenshap and @ozkatz100 talk about "git for data" I would definitely say is a serious idea.
However to the point at the end of the video, RTFM—it took reading docs.lakefs.io/using_lakefs/d… and some other pages subsequently to really grok the concept in practice.
Where I struggled at first with the git analogy alone was that data changes, and I couldn't see how branch/merge fitted into that outside of the idea of branching for throwaway testing alone. The 1PB accident was useful for illustrating the latter point for sure.
But then reading docs.lakefs.io/understand/roa… made me realise that I was thinking about the whole thing from a streaming PoV—when actually the idea of running a batch against a branch with a hook to validate and then merge is a freakin awesome idea