Stanislav Kozlovski Profile picture
Sep 14, 2024 2 tweets 4 min read Read on X
A Kafka in the cloud doing 30MB/s costs more than $110,000 a year.

A $1,000 laptop can do 10x that.

Where did we go wrong? 👇

The Cloud. Namely - its absurd networking charges 👎

Let’s break it down simply:

• AWS charges you $0.01/GB for data crossing AZs (but in the same region).
• They charge you on each GB in and out. Meaning each time a GB passes, you pay twice - for the one who sends it (outgoing) and the one who receives it (incoming)
• For a normal Kafka cluster with replication factor of 3 and a read fanout of 3x, you are going to be charged:
• 2x for 2/3rd of the produce throughput
• 4x for 100% of the produce from replicating it
• 6x of 2/3rd of the produce throughput for consumption.

(but it can get a lot worse - read until the end to see)

Simple example:
• 3-broker cluster, each in a separate AZ
• 3 producers, each in a separate AZ
• 3 consumer groups with 3 consumers each, each group with consumers in a separate AZ

The producers are producing 30MB/s in total to the same leader.

2/3 producers are in a different AZ, so 20MB/s of produce traffic is being charged at cross-zone rates. 👌

It’s charged both on the OUT (producer’s side) and IN (broker’s side).

The leader is replicating the full 30MB/s to both of its replicas.

This is again being charged both on the OUT (leader’s side) and IN (follower’s side), for both replication links. (60MB/s)

Then, each of the 3 consumer groups has 3 consumers.

All consumers read from the leader, with 2/3 in a different zone.

This results in 20MB/s of consume traffic charged at cross-zone rates PER GROUP. (60MB/s total)

Again charged both on the OUT (broker’s side) and IN (consumer’s side).

The total amounts to 140MB/s worth of cross-AZ traffic. Charged both ways.

When one MB is $0.00001/s, this means we’re paying $0.0028/s. 🤔

That’s:

• $241 a day 😕
• $7500 a month 😥
• $88,300 a year 🤯

It all goes down the drain on network traffic ALONE. 🔥

What about the hardware?

Quick napkin math assuming:

• 7 day retention
• all of the data is on EBS (not using tiered storage since it's not GA yet)
• keeping 50% of the disk free for operational purpose (don't ask me what happens if we run out of disk)
• the 3 brokers are running modest r4.xlarge instances (kinda overkill but hey, why not)

We'd pay:

• $19,440/yr for the EBS storage
• $6,990/yr for the EC2 instances

That’s right - you’re paying just $26.4k/yr for the hardware and 88.3k for the network (3.3x the hardware)

For a total of $115k/yr. 💸

I’m not even counting load balancer costs, which could be $12k by some quick napkin math too.

How ridiculous is that? 😂

Want it to get more ridiculous?

This calculation assumes you’re hosting your own Kafka cluster in the same AWS account.

💡If you use a managed Kafka provider that’s not AWS, or otherwise just another AWS account, you’re typically connecting to them through a public endpoint.

AWS then charges all traffic at the cross-AZ $0.01/GB rate internet traffic rate, even if it's in the same AZ.

The end result?

$113,000 a year for network costs. 💀

For 30MB/s. (!!!)

btw - 30 MB/s is absolutely nothing for Kafka... 🤡

It is most often network/disk bounded.

Doing 3GB/s is not hard. 👌

The higher throughput you go, the more absurdly large this discrepancy between network and hardware cost becomes.

For example - this exact setup could probably do 3x the traffic (90MB/s), assuming storage space isn't a concern.

Then you'd have:

• $264,000 a year for the cross-AZ rate. 🥲
• $339,000 a year for the internet rate. 💀

Why is this cost (more than 300k a year) and complexity (this calculation) the case when three laptops can run this practically for free?

Where did we go wrong?

Worth Noting:

There are a few optimizations that can be done here:

• consumers can use fetch from follower, which results in free read traffic (no cross-AZ charges) in the first example. But the second example would still be charged internet costs. 🤝

• you can avoid internet costs by VPC-peering or Private Link-ing the two AWS accounts. This is largely what most cloud providers do, otherwise it becomes prohibitively expensive. It can be super complex to do. 🔧

• AWS can give you large discounts (up to 90%+ afaict) on the quoted prices, depending on your usage. It’s unclear what customer gets what discount. 💰

And perhaps the best example - you can use an ingeniously-designed product like WarpStream that eliminates all of this complexity and cost. ⭐️

It's no wonder they got acquired after just 13 months of operation.Image
I had to edit this because I got the public endpoint traffic cost wrong.

There's surprisingly little info out online about this, and many people seem confused about it.

After latest research, I conclude public endpoints get charged at the usual $0.01/GB rate, I believe on both sides.

It doesn't affect the first part of the calculation, but later on I got some vastly inaccurate numbers.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stanislav Kozlovski

Stanislav Kozlovski Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BdKozlovski

Apr 11
Yesterday, CloudFlare dropped a bomb that I believe may change the future of Lakehouse storage.

R2 + Iceberg should become the de-facto choice for hybrid and multi-cloud data lakehouse architectures.

Here's why it may break the cloud monopoly 🧵
What's Iceberg? 🧊

An open table format. It enhances file formats like Parquet by adding extra layers of metadata that enable ACID transactions and more.

It decouples the storage layer (data) from the query layers (engines) that use it - called the headless data architecture. Image
As we covered last summer, Tabular got acquired for $1-2 BILLION as part of the open table format revolution.

Many companies took notice. It's been an avalanche ever since, with Iceberg support announcements exploding.

Read 12 tweets
Mar 21
Knowing Kafka internals gets you ahead of 94% of Kafka admins.

But reading the code takes a lot of effort.

I read it so you don't have to.

Here are 9 ways how Tiered Storage (KIP-405)'s write path works in Kafka. Image
1. Async Writes

Simply put, Kafka tiers to an external store asynchronously. Your data is saved locally first.

Because of that, it temporarily keeps both copies for a (small) subset of the data.

Reads can come from either place, with preference always being the local one. Image
2. When does Kafka tier?

• Only when the segment is closed. (so you don't have producers actively writing) 🙅‍♂️
• Has all of the required index files created. 👌
• The LSO (high watermark) has advanced beyond the segment. (i.e its fully replicated and transactions complete) 🙌 Image
Read 10 tweets
Mar 18
Apache Kafka 4.0 was just released!

It bumps Kafka to 1.4 million lines of code.

What comes with this new release?

Here are the top 3 features you should know about 🔥 Image
RIP ZooKeeper 💀

After 14 faithful years of service, Kafka completely removes support for ZooKeeper.

With KRaft, controller brokers now reach consensus amidst each other and store all cluster metadata in an internal topic. Image
Queues ✨

Kafka gets a brand new type of consumer group called Share Consumers.

Share Consumers offer queue-like semantics:
• per-message acknowledgement and retry
• ability to have many consumers collaboratively read the same partition Image
Read 9 tweets
Feb 8
KIP-405 released 4 months ago and forever changed Apache Kafka.

Can you guess why?

Here are 14 reasons 🧵 Image
Kafka was always designed for on-premise deployments.

Made to run on cheap commodity hardware (HDDs) and scale horizontally.
The years showed this setup doesn't scale easily.

Managing state is really hard. Moving data is slow. Stateless applications are much easier. Image
It's a database - you can't eliminate all the state. So what's the next best thing?

Reduce the state.

It doesn't make the system fully cloud-native, but it does make it MORE cloud-native. 🙂

This simple idea goes a (very!) long way. Image
Read 24 tweets
Dec 20, 2024
Kafka is about to get a lot cheaper 🔥

@ivan0yu has published KIP-1123: Rack-aware partitioning for the Kafka Producer. 🎅

Just one week after my calculator post, where I shared the idea of a Produce to Same-AZ Leader KIP!

What a christmas gift! 🎁

🧵 Image
Kafka producers usually write to a particular partition they choose. That partition lives on a random broker.

That broker can be in another AZ.

In the cloud, cross-AZ networking is expensive! 💰

But why do producers choose a particular partition? Image
For ordering guarantees!

This is usually denoted by the key of a record.

If a record has a key, the producer client will deterministically choose the same partition.

It calculates a murmur2 hash and that value is modulo the partition count of the topic. 🤖

The result? Image
Read 7 tweets
Dec 14, 2024
has a vendor ever told you that open source Kafka is expensive?

here are 7 ways cost calculators lie to you

(featuring a real story) Image
the story begins in 2023 when WarpStream was first released with a poignant piece called "Kafka is Dead, long live Kafka"

It was an innovative Kafka-compatible system that avoided inter-zone networking and disks. 💡

It's main value proposition was that it was 10x cheaper. Image
They proved these claims through a cost calculator accesible on their website.

It sounded right!

• the cloud is super expensive
• Kafka racks up a ton of network costs
• Kafka without Tiered Storage uses a ton of disk & that isn’t cheap either.

So it must add up! Image
Read 45 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(