Stanislav Kozlovski Profile picture
Sep 14, 2024 2 tweets 4 min read Read on X
A Kafka in the cloud doing 30MB/s costs more than $110,000 a year.

A $1,000 laptop can do 10x that.

Where did we go wrong? 👇

The Cloud. Namely - its absurd networking charges 👎

Let’s break it down simply:

• AWS charges you $0.01/GB for data crossing AZs (but in the same region).
• They charge you on each GB in and out. Meaning each time a GB passes, you pay twice - for the one who sends it (outgoing) and the one who receives it (incoming)
• For a normal Kafka cluster with replication factor of 3 and a read fanout of 3x, you are going to be charged:
• 2x for 2/3rd of the produce throughput
• 4x for 100% of the produce from replicating it
• 6x of 2/3rd of the produce throughput for consumption.

(but it can get a lot worse - read until the end to see)

Simple example:
• 3-broker cluster, each in a separate AZ
• 3 producers, each in a separate AZ
• 3 consumer groups with 3 consumers each, each group with consumers in a separate AZ

The producers are producing 30MB/s in total to the same leader.

2/3 producers are in a different AZ, so 20MB/s of produce traffic is being charged at cross-zone rates. 👌

It’s charged both on the OUT (producer’s side) and IN (broker’s side).

The leader is replicating the full 30MB/s to both of its replicas.

This is again being charged both on the OUT (leader’s side) and IN (follower’s side), for both replication links. (60MB/s)

Then, each of the 3 consumer groups has 3 consumers.

All consumers read from the leader, with 2/3 in a different zone.

This results in 20MB/s of consume traffic charged at cross-zone rates PER GROUP. (60MB/s total)

Again charged both on the OUT (broker’s side) and IN (consumer’s side).

The total amounts to 140MB/s worth of cross-AZ traffic. Charged both ways.

When one MB is $0.00001/s, this means we’re paying $0.0028/s. 🤔

That’s:

• $241 a day 😕
• $7500 a month 😥
• $88,300 a year 🤯

It all goes down the drain on network traffic ALONE. 🔥

What about the hardware?

Quick napkin math assuming:

• 7 day retention
• all of the data is on EBS (not using tiered storage since it's not GA yet)
• keeping 50% of the disk free for operational purpose (don't ask me what happens if we run out of disk)
• the 3 brokers are running modest r4.xlarge instances (kinda overkill but hey, why not)

We'd pay:

• $19,440/yr for the EBS storage
• $6,990/yr for the EC2 instances

That’s right - you’re paying just $26.4k/yr for the hardware and 88.3k for the network (3.3x the hardware)

For a total of $115k/yr. 💸

I’m not even counting load balancer costs, which could be $12k by some quick napkin math too.

How ridiculous is that? 😂

Want it to get more ridiculous?

This calculation assumes you’re hosting your own Kafka cluster in the same AWS account.

💡If you use a managed Kafka provider that’s not AWS, or otherwise just another AWS account, you’re typically connecting to them through a public endpoint.

AWS then charges all traffic at the cross-AZ $0.01/GB rate internet traffic rate, even if it's in the same AZ.

The end result?

$113,000 a year for network costs. 💀

For 30MB/s. (!!!)

btw - 30 MB/s is absolutely nothing for Kafka... 🤡

It is most often network/disk bounded.

Doing 3GB/s is not hard. 👌

The higher throughput you go, the more absurdly large this discrepancy between network and hardware cost becomes.

For example - this exact setup could probably do 3x the traffic (90MB/s), assuming storage space isn't a concern.

Then you'd have:

• $264,000 a year for the cross-AZ rate. 🥲
• $339,000 a year for the internet rate. 💀

Why is this cost (more than 300k a year) and complexity (this calculation) the case when three laptops can run this practically for free?

Where did we go wrong?

Worth Noting:

There are a few optimizations that can be done here:

• consumers can use fetch from follower, which results in free read traffic (no cross-AZ charges) in the first example. But the second example would still be charged internet costs. 🤝

• you can avoid internet costs by VPC-peering or Private Link-ing the two AWS accounts. This is largely what most cloud providers do, otherwise it becomes prohibitively expensive. It can be super complex to do. 🔧

• AWS can give you large discounts (up to 90%+ afaict) on the quoted prices, depending on your usage. It’s unclear what customer gets what discount. 💰

And perhaps the best example - you can use an ingeniously-designed product like WarpStream that eliminates all of this complexity and cost. ⭐️

It's no wonder they got acquired after just 13 months of operation.Image
I had to edit this because I got the public endpoint traffic cost wrong.

There's surprisingly little info out online about this, and many people seem confused about it.

After latest research, I conclude public endpoints get charged at the usual $0.01/GB rate, I believe on both sides.

It doesn't affect the first part of the calculation, but later on I got some vastly inaccurate numbers.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stanislav Kozlovski

Stanislav Kozlovski Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BdKozlovski

Sep 24
Think you know scale? 😂

You don't know S3 scale.

Here's a quick read on how S3 leverages unfair advantages and little-known tricks to serve over a petabyte a second on slow commodity hard drives 🧵 Image
S3's story is leveraging its massive scale to its fullest extent to offer something that would be impossible otherwise.

S3 is a 17+ year-old service consisting of more than 300 microservices 🤯

In 38 regions, 120 AZs 🌎 Image
What stands behind all of this?

Hard Drives.

Let's reflect on them a bit. In 1956, a 3.75MB hard drive cost $9,000 (it made for a baller workstation though).

Today, you can buy 26TB drives where 1TB is worth ~$15.

Projections say we can expect 200TB drives in 10 years. Image
Read 35 tweets
Jul 22
How did LinkedIn use a sketch algorithm in Apache Pinot to achieve:
• an 88% reduction of data (1TB → 120GB) 🔥
• improve data freshness 50%?
• 5x lower p95 latency?

🧵 In this thread, you'll:
• learn how to compute audience intersection sizes
• see some good memes 😁 Image
First - the problem:

• you are LinkedIn 👑
• you sell ads 🤑
• you offer fine-grained targeting of ads 👶
• you want to be able to tell your customer their expected audience size 🤝

Now solve this at scale, and with speed.

It becomes tricky! Image
In particular, customers should be able to pick cardinality like “target members who”:

• live in the US/Canada
• work for LinkedIn
• know Java/C++

And the campaign manager should quickly give them an estimate of the audience size. Image
Read 17 tweets
Apr 11
Yesterday, CloudFlare dropped a bomb that I believe may change the future of Lakehouse storage.

R2 + Iceberg should become the de-facto choice for hybrid and multi-cloud data lakehouse architectures.

Here's why it may break the cloud monopoly 🧵
What's Iceberg? 🧊

An open table format. It enhances file formats like Parquet by adding extra layers of metadata that enable ACID transactions and more.

It decouples the storage layer (data) from the query layers (engines) that use it - called the headless data architecture. Image
As we covered last summer, Tabular got acquired for $1-2 BILLION as part of the open table format revolution.

Many companies took notice. It's been an avalanche ever since, with Iceberg support announcements exploding.

Read 12 tweets
Mar 21
Knowing Kafka internals gets you ahead of 94% of Kafka admins.

But reading the code takes a lot of effort.

I read it so you don't have to.

Here are 9 ways how Tiered Storage (KIP-405)'s write path works in Kafka. Image
1. Async Writes

Simply put, Kafka tiers to an external store asynchronously. Your data is saved locally first.

Because of that, it temporarily keeps both copies for a (small) subset of the data.

Reads can come from either place, with preference always being the local one. Image
2. When does Kafka tier?

• Only when the segment is closed. (so you don't have producers actively writing) 🙅‍♂️
• Has all of the required index files created. 👌
• The LSO (high watermark) has advanced beyond the segment. (i.e its fully replicated and transactions complete) 🙌 Image
Read 10 tweets
Mar 18
Apache Kafka 4.0 was just released!

It bumps Kafka to 1.4 million lines of code.

What comes with this new release?

Here are the top 3 features you should know about 🔥 Image
RIP ZooKeeper 💀

After 14 faithful years of service, Kafka completely removes support for ZooKeeper.

With KRaft, controller brokers now reach consensus amidst each other and store all cluster metadata in an internal topic. Image
Queues ✨

Kafka gets a brand new type of consumer group called Share Consumers.

Share Consumers offer queue-like semantics:
• per-message acknowledgement and retry
• ability to have many consumers collaboratively read the same partition Image
Read 9 tweets
Feb 8
KIP-405 released 4 months ago and forever changed Apache Kafka.

Can you guess why?

Here are 14 reasons 🧵 Image
Kafka was always designed for on-premise deployments.

Made to run on cheap commodity hardware (HDDs) and scale horizontally.
The years showed this setup doesn't scale easily.

Managing state is really hard. Moving data is slow. Stateless applications are much easier. Image
It's a database - you can't eliminate all the state. So what's the next best thing?

Reduce the state.

It doesn't make the system fully cloud-native, but it does make it MORE cloud-native. 🙂

This simple idea goes a (very!) long way. Image
Read 24 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(