Read on Twitter

Charity Majors @mipsytipsy

, 27 tweets, 7 min read Read on Twitter

@lyddonb

@lyddonb

alright kids listen up, imma bout to live blog this talk by @lyddonb:

and not because it's been called a "30-min ad for @honeycombio", but because it's a master class in software engineering and technical leadership for the era of distributed systems.

software is eating the world, but we engineers have done a terrible job of helping others understand what we do.

we laugh at "normal people" misunderstanding tech, but that's actually *our* failure to explain it. and a backlash is brewing. (GDPR, the FBI on encryption, etc).

but here's the funny part: not only can't we explain our own technology to others, we don't actually understand our own systems.

so we need to start with ourselves. we must build the tools that let us understand our systems from the ground up, from every perspective.

... this is extremely hard. it's much harder to understand a system than to build it.

<<hilarious historical detour into the past 20 years of systems architectures>> tldr: don't microservice til you have to, you won't microsurvive [ed: my joke]

modern systems are ephemeral, exploding in size and complexity, with higher reliability needs and greater feature demands than ever. our systems have become so complex that they produce emergent behaviors that can't be predicted.

... in other words, our systems are becoming much more like living systems.

this seems fine

but don't worry: open source is here to save us!

lmao, cry, etc

our ways of dealing with these systems have had to adapt to keep up with the chaos. most of us are swimming in the wake of FANG, tho we are catching up disturbingly fast.

so let's talk about complexity and how to deal.

lesson one: STRUCTURE YOUR FUCKING DATA

["structure",
"your",
"fucking",
"data"]

structure it more

erp, sorry, got ahead of myself. lesson one is actually about monitoring or understanding the health of your systems. what is broken, what is fast/slow etc?

for the system to keep up with understanding the systems we need the systems to *self describe* their state and health.

so pass a context object around, a struct that contains alll the fucking context for the request -- whatever info you will wish you had when future-you is debugging the system at 3 am.

[**this is basically EXACTLY how honeycomb gathers data with the beelines or SDKs, btw.]

[also note that all those fields are unique IDs, aka high cardinality columns. literally ALL the useful debugging info is going to be ids and other high-cardinality data types like this, because it is the best identifying information.]

if you run a large multi-tenant system, you probably learned these lessons the hard way a long time ago.

the ability to route, filter, drop, return custom error codes, degrade service etc per unique tenant is basically necessary for your survival as a platform.

ok NOW it's time for rule #2:

STRUCTURE YOUR [FUCKING] LOGS

fuck writing regexes bc we're too lazy to add structure at the time it actually makes the most sense, and could save us the most amortized pain. start with good hygiene now bc microservices will *kill you* otherwise

next you need a data pipeline. typical evolution goes something like this:

clients: console.log -> google analytics && BI
services: tail -f -> splunk -> nagios && BI
storage: gnarly shell commands -> datadog && nagios

.. but now your non critical path has a bazillion systems

... consolidate all this garbage into a single queue. look in to sidecar style of data collection (fluentd is good). write to stdout, sidecar collects and sends to queue.

where should the data go? at minimum, a cheap long term solution like glacier: forever.

also nice? something queryable, like bigshift.

"THIS IS WHY WE STRUCTURE OUR LOGS"

there are also other options. [insert a few honeycomb slides about high cardinality data and other capabilities] but the bottom line is: "start to think like scientists. like data people."

[editor's note: i say this all the fucking time, man. systems engineers are not used to having nice things. we've had these tools in BI forEVER -- biz folks don't just flip through a few dashboards looking for something that plausibly matches the current crisis!]

"see the raw data behind every result"
"share, collaborate"

... there are also many other non-honeycomb options, still a bit too dashboard-based but probably moving in the right direction.

structure and standardize all data coming out of your systems. with a data pipeline, you can plug in whatever vendors or tools come next.

(ahem structure your data)

number 5: minimize, isolate and track dependencies. unmanaged dependency hell is where throughput goes to die.

(shits on golang a bit, damns with faint praise, shows some terrific diagrams demonstrating the cost of freewheeling dependencies)

number 6: use network sidecars (service mesh, proxies) to minimize and isolate dependencies. provides a nice isolated abstraction for new technologies, plugs in nicely to your data pipeline.

and finally tracing. we need excellent visualizations that help us isolate different types of failures in these extremely complex distributed systems. tracing (and the impact-free tracing bullet) is super core to this.

... but wait! one more thing...

TEST IN PRODUCTION

tracer bullets, feature flags, tester accounts ... all these are necessary prereqs to the holy grail, which is to stop flushing years of our lives down useless staging environments in the name of "safety" [ed: ahahahha ha ha]

support and exercise chaos tooling. build guard rails. help people experiment and learn in the real world, from production systems. this builds up their intuition and doesn't lead them astray like staging does.

Production is the only environment that matters.

<< insert some seriously righteous snark about the illogical arguments of those who would deny your god given right to test in prod >>

@lyddonb

@lyddonb

The end. Here is the excellent TLDR which I heartily endorse every inch of.

(@lyddonb if you DM your address and shirt size i will send you an "I TEST IN PROD" t-shirt. thank you for this talk!)

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Charity Majors

This content may be removed anytime!

Try unrolling a thread yourself!

More from @mipsytipsy see all

Related threads

Trending hashtags

Did Thread Reader help you today?