Solomon Eseme Profile picture
Dec 8 11 tweets 2 min read Read on X
You’re in a backend interview.

They ask:

“Design a globally distributed configuration propagation service that pushes config updates to tens of thousands of servers within seconds, with versioning, rollback, and strong delivery guarantees.”

Here’s how to approach it:
Start by clarifying the core requirements:

- Config changes must propagate worldwide within seconds
- Strong versioning and atomic rollout per region
- Rollback must be instantaneous
- Agents must validate the integrity and signature of configs
- Updates must be durable, auditable, and conflict-free
Core components:

- Control plane API and metadata store
- Regional coordinators with version tracking
- Fan-out push clusters (WebSocket / long-poll)
- Edge agents with local cache + signature verification
Primary flow:

- Admin submits config draft -> validated and versioned
- Control plane writes an immutable version record
- Regional coordinators fetch the new version and publish rollout metadata
- Push clusters notify connected agents
- Agents fetch, verify, apply, persist, then ack
Reliability / Guarantees:

- At-least-once notification, exactly-once version application
- Commit = agent-verified checksum and signature
- Agent retries until a successful fetch
- Coordinators track rollout health; failed agents quarantined
- Rollback = publish the prior version as the new active pointer
Scaling strategy:

- Coordinators horizontally sharded by region
- Push clusters scaled via connection fan-out; stateless frontends
- Agents maintain persistent connections to the nearest region
- Version store globally replicated via multi-region quorum
- Backpressure via staged rollouts
Data & storage:

- Version metadata: strongly consistent store (etcd/Spanner/ZK)
- Config blobs: object storage with immutable keys
- Hot metadata cached at coordinators
- Agents store applied versions locally for restart resilience
- Indexed by version, region, rollout status
Observability & Ops:

- Metrics: propagation latency, success rate, agent ack skew
- Logging: version creation, audit trails, signature verification results
- Tracing: publish path from control plane → coordinators → push nodes
- Alerts: stalled regions, agent failure clusters, version drift
Edge cases & trade-offs:

- Coordinators overloaded: staggered rollout windows
- Split-brain version pointers: strong quorum guards
- Agents offline for long periods: delayed version reconciliation
- Cost trade-off: persistent connections vs periodic pull
- Propagation latency vs blast radius (progressive deployments)
How to say it in an interview:

“I’d design this system using a global control plane with immutable versioning, regional coordinators for scoped rollout, and fan-out push clusters for low-latency propagation. The system scales through regional sharding and stateless push nodes, maintains reliability via version pointers, retries, and signature verification, and remains observable with latency, ack, and health metrics. This delivers rapid, safe config distribution at a global scale.”
If you like Tweets like this, you will absolutely enjoy my exclusive weekly newsletter,

Sharing exclusive backend engineering resources to help you become a great Backend Engineer.

Join 12,000+ subscribers here:

backendweekly.dev

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Solomon Eseme

Solomon Eseme Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Kaperskyguru

Dec 1
You’re in a backend interview.

They ask:

“How would you design a distributed cron scheduling system that ensures tasks run exactly once, on time, across multiple nodes without collisions or duplicates?”

Here's how to approach it:
If you want a complete breakdown of this question and more. Subscribe to

I share them every Saturday.backendweekly.dev
Start with the core requirements:

- Define and store cron schedules centrally.
- Multiple scheduler nodes, but only one should trigger a job at any moment.
- Tasks must run exactly once, even if nodes restart or fail.
- Must support retries, backoff, and idempotent execution.
- Visibility into last-run / next-run times.
Read 14 tweets
Nov 22
You’re in a backend interview.

They ask:

“How would you design a multi-tenant notification delivery system that handles email, SMS, and push at scale?”

Here’s how to approach it:
Start with the requirements:

- Multi-tenant isolation (quotas, limits, branding, templates).
- Support multiple channels (email, SMS, push).
- Fault-tolerant, retry-capable delivery
- Ability to plug in multiple third-party providers per channel
- Message tracking and audit logs
Core components:

- Notification API to receive requests from tenant apps.
- Router to classify channel type and pick a provider.
- Queue layer to buffer and retry.
- Workers per channel: email-worker, sms-worker, push-worker.
- Provider adapters to normalize interactions.
Read 12 tweets
Oct 24
You’re in a backend interview.

They ask:

“How would you design a notification system that can send emails, SMS, and push notifications at scale?”

Here’s how to approach it: Image
Before we dive in, we are building the next interview prep playground targeting backend engineers.

Join our MB Interview waitlist: tally.so/r/w46glb
Let's start with the goal:

You need a multi-channel notification system that’s reliable, async, and scalable.

Think of how Slack, Uber, or Stripe sends millions of notifications daily.
Read 13 tweets
Oct 17
You’re in a backend interview.

They ask:
“How would you design a rate-limiting system for APIs at scale?”

Here’s how to approach it:
Before we dive in, we are building the next interview prep playground targeting backend engineers.

Join our MB Interview waitlist: tally.so/r/w46glb
Start with the goal:

You want to prevent abuse, control traffic, and ensure fair usage — without hurting legitimate users.

Think of it as: allow X requests per user per time window.
Read 13 tweets
Sep 8
You’re in a backend interview.

They ask:

“How would you design an authentication system for a large-scale web application?”

Here’s how to approach it:
Before we dive in, we are building the next interview prep playground targeting backend engineers.

Join our MB Interview waitlist:
tally.so/r/w46glb
Start with requirements:

- Secure login & session management.
- Support for web + mobile clients.
- Handle millions of users.
- Protect against common attacks (replay, token theft).
Read 11 tweets
Sep 1
You’re in a backend interview.

They ask:

“How would you design a notification system (email, SMS, push) that scales?”

Here’s how to approach it:
Before we dive in, we are building the next interview prep playground targeting backend engineers.

Join our MB Interview waitlist:
tally.so/r/w46glb
Start with the requirements:

- Support multiple channels (email, SMS, push).
- Handle high throughput (millions of notifications).
- Guarantee delivery as much as possible.
- Allow retries and user-level preferences.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(