#Current22 @AdiPolak talking about chaos engineering Image
A scary list of all the things that could go wrong with data flows #Current22 Image
“Disagree and commit” h/t @matryer #Current22 Image
What can we learn from the software world of chaos engineering and apply it to the world of data flows? Image
Principles of chaos engineering

#Current22 Image
Comparing steady state meaning in devops/SRE world to that in data #Current22

“The data isn’t wrong; your expectation of the data is wrong” ImageImage
Chaos engineering - varying real world events. In data context this could be schematic changes, data corruption, fubar with partition deletion…

#Current22 Image
No one wants to work weekends…

#Current22 Image
Testing in production. Which in the data world means using production data 😱 #Current22 Image
The stages of a data product #Current22 Image
“git for data” with @lakeFS

#Current22 Image
@lakeFS is an open source project, written in Go. It uses copy-on-write to efficiently provide duplicate copies of files. #Current22 ImageImage
Using quality check hooks to protect production data Image
Live demo time at #Current22 ImageImage
Creating a branch of data ImageImage
“High performance yarn” is “fiction” 😆 #Current22 Image
What if the join doesn’t work as we intended it? ImageImage
Uh oh, we’ve got nulls ImageImage
Now what do we do? Throw away the null data? Try and replace the values? How about just rolling back to before we made the change. #Current22 Image
Now we fix the join and do it properly. We do the same data checks again to confirm it. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Robin Moffatt 🍻🏃🥓

Robin Moffatt 🍻🏃🥓 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rmoff

Oct 5
Can’t wait for this panel discussion at #current22 with ⁦@takidau⁩ ⁦@notamyfromdbt⁩ ⁦@krisajenkins⁩ ⁦@AdiPolak⁩ ⁦@esammer⁩ - it’s gonna be awesome!
(and it’s being live-streamed - make sure you tune in!) Image
Are we going to have batch and streaming forever, or will they converge? @esammer says at the heart of systems lambda arch will go away and kappa will eventually win out. Once in DW perhaps batch will remain for its familiarity to analytics engineers. Image
@notamyfromdbt - Microbatching gets used to simulate streaming but with same toolset for familiarity, but it doesn’t scale
Read 6 tweets
Oct 5
Apparently data people are really boring people, so the hype around big data dying down fitted well #current22 Image
The most boring diagram in IT. We’ve standardised the tooling around all this (except BI) #current22 Image
Fivetran, dbt, snowflake are the boring defaults #Current22 Image
Read 20 tweets
Oct 5
#Current22 @bennstancil talks about the end of big data industrial complex Image
Benn got into big data in 2012 at Yammer, right at the beginning of the hype
Recounts the story of Target using data science to send coupons to customers who were determined to be pregnant based on purchasing habits
Read 6 tweets
Oct 5
“A lot of the time you don’t need real time” *gasp* #current22
“A lot of the Modern Data Stack is marketing bullshit” #current22

OMG I love this talk Image
ELT vs ETL
#current22 Image
Read 17 tweets
Oct 5
Dan Sotolongo at #current22: RDBMS and SQL have stood the test of time. Sets the scene for stream processing by covering core concepts of tables and steams ImageImage
#current22 handling event time joins in SQL using functions. Image
The next problem is making sure we have all the data. It’s watermarks, but not really

#Current22 Image
Read 11 tweets
Sep 8
Having watched @gwenshap and @ozkatz100 talk about "git for data" I would definitely say is a serious idea.
However to the point at the end of the video, RTFM—it took reading docs.lakefs.io/using_lakefs/d… and some other pages subsequently to really grok the concept in practice.
Where I struggled at first with the git analogy alone was that data changes, and I couldn't see how branch/merge fitted into that outside of the idea of branching for throwaway testing alone. The 1PB accident was useful for illustrating the latter point for sure.
But then reading docs.lakefs.io/understand/roa… made me realise that I was thinking about the whole thing from a streaming PoV—when actually the idea of running a batch against a branch with a hook to validate and then merge is a freakin awesome idea
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(