Tweet

Dan Luu

Follow @danluu

4 Dec, 12 tweets, 3 min read

Is there anyone who's writing about different problem solving approaches / styles? An example of the kind of thing I mean (but, incomplete, because it would be nice to see more than two approaches to a problem and I'm only going to discuss two for this example):

Once, at a meetup Matt Singer was hosting, Brendan Gregg asked me what I was working on, and I mentioned that I'd recently written a little (5kLOC) parser to parse every line of every dmesg we had in our datacenters to audit machine health issues.

Of course, Brendan had done a vaguely analogous thing for Netflix and he showed me what he'd done, which was so much in his style that I think that if you saw the result without knowing who did it, you'd say "wow, this looks like something Brendan Gregg would make".

It had a cool visualization where some of the power of the visualization comes from utilizing your brain's pattern recognition ability.

And it was interactive, you could click to get a lot more info on what was going on (and maybe you could mouse over things? I forget).

My solution was basically the opposite. I don't think it's better or worse (there are pros and cons to each), but I think it was the better solution for me and I suspect I would've failed if I'd tried to produce something that was good in the ways Brendan's solution was good.

My solution used one of my standard approaches at work: define a metric (in this case, higher was worse) and then show that the metric being higher is strongly and causally related to severe problems (data corruption, order of magnitude increase in tail latency, incidents, etc.)

And, for each counterargument people would have (which I already knew because I'd informally talked to people about it), have a rigorous explanation for why the counterargument is wrong, e.g., one I knew was going to get was that a machine going down is fine because

we sometimes lose TORs without incident, which is equivalent to losing an entire rack, so it shouldn't be bad to lose a host.

But I analyzed significant load-related incidents and found that a single host going down was implicated in a double digit percentage of those incidents,

so we clearly were not operating all services in a way where it was ok to randomly lose hosts.

All this stuff went into a document. The causal link and the counter-counter arguments were the majority of the document, by volume, but also all at the end, so people can skip it.

The other big chunk of the doc was a breakdown in various dimensions (host age, SKU, cluster owner, whether or not the host used the host health management system built by the aurora/mesos team, etc.), to determine the causes of the problem and what mitigations would be effective

And then, at the top of the document was a summary, X% of hosts have some critical health issue, with a description of what could be done about it.

I'm not good at making interactive visualizations the way Brendan is and his approach would've been very hard for me.

But I'm ok at finding a data-driven description of a problem in a way that knocks down all the objections one might use to avoid addressing the problem, so that's what I did.

I'm sure there are other approaches that would work and I'd love to see more discussion of approaches.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @danluu

Dan Luu

@danluu

29 Nov

Is there anyone doing in-depth interviews on various aspects of why the world is the way it is?

Some examples of interviews I'd like to hear below

Looking for interviews because I don't think one person could have the breadth & depth to regularly answer these kinds of questions

How is it that Michelin has generally had either the best in class tire or close for every class of tire they make for decades?

Perhaps this isn't inherently more mysterious than the effectiveness of Apple's CPU design group, but I don't know who I could ask about Michelin.

Why has non-OC canoe tech stagnated relative to kayak tech?

There's the obv. answer that there's more $ in it, but I want to know why specific innovations that seem like they should be portable are super niche, e.g., the stuff Nick Adnitt is doing, or GRB's curved blade paddle.

Read 7 tweets

Dan Luu

@danluu

4 Nov

If I want to fully support myself from my blog, is substack basically the only reasonable game in town? I'd like that to not be the case, but it seems like it might be?

From numbers people have posted, substack has a much higher conversion rate for writing than patreon, GH, etc.

It seems like 10% isn't an uncommon conversion rate, which seems incredibly high if you compute what the equivalent number would be for a blog that's supported via Patreon or GH sponsors.

You can try to make up the difference by adding higher tiers, like Andy Matuschak has, but

substack also supports tiers and, to make up the difference in conversion, you'd need very high tiers, like Evan has for vue.js support.

Evan does get sponsors for the high tiers, but they're corporate supporters, which isn't something you can expect for a programming blog.

Read 5 tweets

Dan Luu

@danluu

23 Oct

I find it sort of astounding how, 17 years after Steve Yegge published sites.google.com/site/steveyegg…, almost no companies "get it" when it comes to marketing the company to potential hires, e.g., SOSP flyers:

web.archive.org/web/2021102308…
web.archive.org/web/2021102308…
web.archive.org/web/2021102308…

MS's flyer reads like it was created by the marketing department without consulting any engineers.

FB's flyer is basically a noop.

Google's flyer is great. It was clearly written by somebody who understands what grad students attending SOSP care about in an employer.

I don't think Google really has better opportunities than MS and FB for grad student internships or new PhD hires, but they have someone in the "branding" / "marketing" loop who actually knows what SOSP is and that doesn't appear to be the case for MS or FB.

Read 8 tweets

Dan Luu

@danluu

12 Oct

https://twitter.com/southpolesteve/status/1361168839917117442

Despite the market already seeming bonkers high then, it has gone way up since then.

E.g., a friend of mine who was "senior" at Google (there 4 years with no promo) now makes $750k/yr and got a level bump for changing jobs after FB and another company got into a bidding war.

https://twitter.com/southpolesteve/status/1361168839917117442

https://twitter.com/danluu/status/1297710112727916544

I've found it interesting to watch companies that don't value retention hemorrhage key employees by not keeping up with the market when people are close to burnout and primed to leave, causing predictable & 💰 disasters.

(

https://twitter.com/danluu/status/1297710112727916544

https://twitter.com/danluu/status/1440469014640164870

, etc.)

Another thing I've found interesting to watch is how quickly companies have responded.

At the forefront, there are companies like FB, which are either causing the increase in market rate or responding so quickly that the difference can't be observed externally.

Read 6 tweets

Dan Luu

@danluu

10 Oct

https://twitter.com/rygorous/status/1446983617465307137

I think I failed an interview at FB a long time ago (~2013) because of this.

The interviewer asked me how you can write deadlock free code, and I told him that there's this thing people say about taking/releasing locks in order, but there are places where that won't save you.

https://twitter.com/rygorous/status/1446983617465307137

The interviewer didn't like that answer and said, about other circumstances, "there's got to be a way".

I discussed some places where that isn't sufficient, e.g., in processor hardware and microcode, where you wouldn't do that for performance reasons even if you could and

of course that's where you implement the primitives that other people will be able to take locks and you can't use the primitive you're creating to implement the primitive itself.

But still, the interviewer insisted "there's got to be a way"

Read 6 tweets

Dan Luu

@danluu

1 Oct

I got promoted a while back, which really hammered home how arbitrary promos are.

I was promoted 2x in 3 years at my current job (senior -> staff -> sr. staff) vs 0x in 3 years at other BigCos.

AFAICT, the main difference was that my manager made sure I got credit for my work.

If anything, I think my work was better at other BigCos because I worked as an EE 2 out of the 3 years. By the end, I had 10 YOE on top of having more talent for hardware than for software.

After 3 years in my current role, I have 4 years of professional programming experience.

"Getting credit" is probably subtler than a lot of people would expect, so I'll provide an example. My manager wrote my promo packet and I suspect I wouldn't have gotten promoted if she hadn't written it or provided sufficient information for me to write a very similar document.

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Dan Luu

Try unrolling a thread yourself!

More from @danluu

Dan Luu

Dan Luu

Dan Luu

Dan Luu

Dan Luu

Dan Luu

Did Thread Reader help you today?

Like this author's thread?