Cindy Sridharan Profile picture
Nov 13, 2022 4 tweets 1 min read
I agree that misguided to suggest the only way for managers to “be technical” is by coding.

But boy, some management folks seem so virulently anti-coding in a way that’s just absurd.

Coding is definitely one way (though not the only way or even the best way) to “be technical”. There are many ways “leadership” can contribute to the technical betterment of a project (and improve your own credibility) without writing production code:

- build small side projects using libraries your team authors, giving them feedback on code quality, testing, design etc.
Jul 12, 2021 4 tweets 1 min read
Every time I say something - anything at all - about software quality or dev productivity, I have legions pontificating about unit tests and documentation in my mentions.

So here’s another (slightly contrarian) take: unit tests/docs aren’t always the best yardstick of “quality” In other words, if you’re looking at a project with inadequate docs/test coverage, and immediately think the way to fix it or improve it is by adding more tests/docs, then it’s possible your immediate impact on the project or team productivity might be rather meagre.
Oct 21, 2020 4 tweets 1 min read
So many companies, large and small, end up solving all the wrong problems or the least important ones. Especially common when building infra tools/software.

I see this happen again and again, and we then wonder why the state of the art hasn’t improved in the past 5 years. When there’s a problem space where frankly everything sucks at every layer, it’s common to try to think the way to tame this space is by taking a bottom-up approach.

this approach fails, time and time again. Because it almost always doesn’t provide any immediate value to users.
Aug 3, 2020 5 tweets 5 min read
The paper I've been looking forward to the most is now out: zero downtime deployments at Facebook.

Disruption free release of services that speak different protocols and serve different types of requests (long lived TCP/UDP sessions, requests involving huge chunks of data etc.) "Socket Takeover" should be familiar to traffic nerds. Transferring the listening socket over a Unix Domain Socket with ancillary message (CMSG) + SCM_RIGHTS is *precisely* how HAProxy does seamless reloads.

What *is* novel is how they transfer UDP (QUIC) socket fds.
Aug 2, 2020 9 tweets 3 min read
This is true. But there’s a corollary:

Please do not hire junior engineers unless your team/org has the bandwidth for proper mentorship.

Hiring a junior engineer is a commitment - you need to be willing to invest at least 1-2 years. A lot of teams aren’t set up for this. Some more things to consider:

- inexperienced managers aren’t probably the best suited to hire and mentor junior engineers, unless these managers themselves have mentorship/guidance from senior managers/leadership folks. A bad manager can be a horrendous formative experience.
Jul 31, 2020 8 tweets 5 min read
Wow this article from Dropbox on why @EnvoyProxy shines has tons of super 🌶 takes: A thread covering security, concurrency, opencore 1/7

The hottest take was probably:

“writing modern C++14 is not much different from using Golang or, with a stretch, one may even say Python.” Still remember the days when @mattklein123 claimed “developer productivity was one of the highlights of modern C++” when introducing Envoy and people raising eyebrows at this claim.

Receipt:
May 15, 2020 4 tweets 2 min read
@danluu Funnily enough, I’ve had Amazon (some still at Amazon, some ex—Amazon) employees who then went on to work at google tell me that the thing they *like* and *miss* about Amazon is the simplicity (unsophisticated-ness) of things. @danluu A lot of systems were very unsophisticated compared to the counterparts at Google (might translate to “worse engineering” in some people’s minds), but that made working with those systems simple and the failure modes (and limitations) of those systems well-understood.
Dec 26, 2019 7 tweets 2 min read
Something I often hear is that Kubernetes is a tool that enables one to build a PaaS.

When I look at the compute options available, I think rolling your own PaaS on top of Kube is akin to painting yourself into a corner.

It doesn’t come across as very forward thinking. Cloud vendors have the personnel and the resources to easily keep up with the latest and greatest in the OSS world and roll out things like “serverless pods” (by, I suspect, making things like their custom FaaS runtimes OCI compatible) or whatever the next fad is.
Dec 23, 2019 5 tweets 2 min read
What are the biggest pain points you believe tooling can address in the next decade (2020-2029)?

I’ll go first:

- CI/CD. Jenkins is currently the CI gold standard and it’s a very low bar.
- Easier abstractions and paradigms for building infra. Kube is too low level + complex. What I mean by “paradigms” is basically this:

Right now the compute spectrum is a bit of an embarrassment of immature riches; as an industry we need to innovate, experiment and educate more on how to pick the right compute option for the wide variety of workloads we already run.
Aug 29, 2019 6 tweets 1 min read
Logical isolation is as important as physical isolation when it comes to fault isolation.

Unless you’re serving identical requests arriving at an identical rate, requiring identical amount of work in a completely static environment, every server process is a multi tenant system. It’s important when designing such services to be able to protect against misbehaving *class* of requests that trigger some form of degraded of pathological behavior, potentially affecting the response time of all other requests.
Jul 25, 2019 8 tweets 2 min read
One neat trick for testing is to test the output of telemetry data.

This is way more low touch than trying to refactor code to make it “testable”.

Many a time, this leads to awkward APIs, encapsulation violations and more. First, it makes you treat things like loggers, tracers and metrics sinks as dependencies. No more shared global state.

Every test instantiates it’s own logger and passes it to the code being tested. The code will do its thing and 1) result a result or an error 2) emit telemetry.
Jun 25, 2019 7 tweets 2 min read
Unless you’re working for serial entrepreneurs, it’s hard to tell if a founder of an early stage startup (without a proven business model) is exceptional or not.

As for “exceptional” engineers, you’re more likely to find these engineers at larger companies than at startups. There are exceptions, of course. There are some “startups” with an exceptional quality of engineering (Hashicorp comes to mind), and there’s garbage code/engineering to be found at large companies.
Jun 25, 2019 6 tweets 5 min read
The architecture of @CockroachDB

CockroachDB has an internal K/V store that stores keys in ranges. There’s an indexing structure that helps one find a range. The indexing structure itself is a range, and there’s another indexing structure used to find this structure. ImageImageImage CockroachDB is a distributed replicated transactional database.

The unit of replication is a range of size 64MB. Raft is used for consensus. Raft provides atomic replication. ImageImageImage
Jun 25, 2019 9 tweets 8 min read
Making distribute tracing easier with more sophisticated visualizations - @YuriShkuro

The first is color coded by service graph. The second is a heat map #QConNYC Now @YuriShkuro is talking about a tool that compares traces.

[ ed- omg this: just yesterday I was talking to a vendor at QCon and was wondering if it’d be possible to compare traces. They said their product didn’t offer this. IMo this is the most important aspect of tracing]
Jun 24, 2019 8 tweets 6 min read
The final talk of my track

The state of Serverless computing - or fixing dysfunctionnad a service

Remember this paper? This talk is basically a gist of this paper. 🍿

There’s been a bunch of research on serverless recently - and Hacker News trashes all of it.

Yet, serverless use is on the rise. FaaS is good for embarrassingly parallel tasks as well as workflow orchestration
Jun 24, 2019 23 tweets 13 min read
Now the talk I’m looking forward to the MOST - @colmmacc on PID loops (pronounces P-I-D loops, and not “pid” (like sid) loops as I’ve always been pronouncing it in my head.

This is going to be so good! I’ll try to live tweet. Haha @colmmacc put my DM to him the slides - as an example of observations and feedback in practice! 😁 #QConNYC
Jun 24, 2019 10 tweets 11 min read
First up in my @qconnewyork track - @kavya719 on lock performance!!

This talk on lock internals and performance characteristics is going to be incredible.

The Go runtime uses locks extensively (as does your memory allocator) sometimes to the detriment of performance. The compiler and processor reorder instructions. It helps to know the rules. @kavya719 at @qconnewyork

The LOCK instruction precisely in x86 is what powers the lock implementation in most programming languages.

OMG - this talk is indescribably good! 😍
Apr 18, 2019 5 tweets 2 min read
Some thoughts on Tinder’s 2 year effort to move to Kubernetes - medium.com/@tinder.engine…

Engineers - most of us - are often terrible at evaluating the opportunity cost of the engineering efforts and the concomitant work it’d take to prop up this infrastructure in the long term. EC2 startup times alone isn’t very convincing to justify a 2 year effort to make this move.

As for the cost savings - see my point about the opportunity cost here.

Third, you probably don’t need Kubernetes and the attendant complexity to get to “infrastructure as code” ideal
Feb 24, 2019 8 tweets 2 min read
These days when I hear about yet-another-monitoring-company planning on integrating tracing/high-cardinality events into their product, part of me is glad the balkanization of o11y signals/viz is ending, but another part of me can’t help think this is a form of cargo-culting. I’ve been - and continue to remain - sceptical of the dream of a “single pane of glass”.

That said I also think iterative drill-down and exploration across various axes, dimensions and systems is going to be critical to successfully troubleshoot issues.
Dec 23, 2018 8 tweets 2 min read
Really enjoying this talk from the @FoundationDB summit on testing distributed systems. Makes a pretty compelling case for an AI-driven approach to testing. /thread

The talk identifies 3 main problems with testing - fragility ("your test comes to rely on properties of your system that are incidental - that are not the ones you thought you were testing"), lack of exhaustiveness and flakiness.
Dec 13, 2018 4 tweets 2 min read
Finally found time to look at this a little more closely.

It's interesting that in the recent 2-3 years, the newer crop of logging systems have shied away from ELK style full text indexing but instead treat log aggregation and querying as a stream processing problem of sorts. Where queries are akin to grep against time partitioned files. One of the important aspects of observability is refining the search space (I believe @el_bhs was the one who mentioned this in one of his recent talks) - which isn't to be confused with visualizing the search space.