Stanislav Kozlovski Profile picture
Sep 14, 2024 2 tweets 4 min read Read on X
A Kafka in the cloud doing 30MB/s costs more than $110,000 a year.

A $1,000 laptop can do 10x that.

Where did we go wrong? 👇

The Cloud. Namely - its absurd networking charges 👎

Let’s break it down simply:

• AWS charges you $0.01/GB for data crossing AZs (but in the same region).
• They charge you on each GB in and out. Meaning each time a GB passes, you pay twice - for the one who sends it (outgoing) and the one who receives it (incoming)
• For a normal Kafka cluster with replication factor of 3 and a read fanout of 3x, you are going to be charged:
• 2x for 2/3rd of the produce throughput
• 4x for 100% of the produce from replicating it
• 6x of 2/3rd of the produce throughput for consumption.

(but it can get a lot worse - read until the end to see)

Simple example:
• 3-broker cluster, each in a separate AZ
• 3 producers, each in a separate AZ
• 3 consumer groups with 3 consumers each, each group with consumers in a separate AZ

The producers are producing 30MB/s in total to the same leader.

2/3 producers are in a different AZ, so 20MB/s of produce traffic is being charged at cross-zone rates. 👌

It’s charged both on the OUT (producer’s side) and IN (broker’s side).

The leader is replicating the full 30MB/s to both of its replicas.

This is again being charged both on the OUT (leader’s side) and IN (follower’s side), for both replication links. (60MB/s)

Then, each of the 3 consumer groups has 3 consumers.

All consumers read from the leader, with 2/3 in a different zone.

This results in 20MB/s of consume traffic charged at cross-zone rates PER GROUP. (60MB/s total)

Again charged both on the OUT (broker’s side) and IN (consumer’s side).

The total amounts to 140MB/s worth of cross-AZ traffic. Charged both ways.

When one MB is $0.00001/s, this means we’re paying $0.0028/s. 🤔

That’s:

• $241 a day 😕
• $7500 a month 😥
• $88,300 a year 🤯

It all goes down the drain on network traffic ALONE. 🔥

What about the hardware?

Quick napkin math assuming:

• 7 day retention
• all of the data is on EBS (not using tiered storage since it's not GA yet)
• keeping 50% of the disk free for operational purpose (don't ask me what happens if we run out of disk)
• the 3 brokers are running modest r4.xlarge instances (kinda overkill but hey, why not)

We'd pay:

• $19,440/yr for the EBS storage
• $6,990/yr for the EC2 instances

That’s right - you’re paying just $26.4k/yr for the hardware and 88.3k for the network (3.3x the hardware)

For a total of $115k/yr. 💸

I’m not even counting load balancer costs, which could be $12k by some quick napkin math too.

How ridiculous is that? 😂

Want it to get more ridiculous?

This calculation assumes you’re hosting your own Kafka cluster in the same AWS account.

💡If you use a managed Kafka provider that’s not AWS, or otherwise just another AWS account, you’re typically connecting to them through a public endpoint.

AWS then charges all traffic at the cross-AZ $0.01/GB rate internet traffic rate, even if it's in the same AZ.

The end result?

$113,000 a year for network costs. 💀

For 30MB/s. (!!!)

btw - 30 MB/s is absolutely nothing for Kafka... 🤡

It is most often network/disk bounded.

Doing 3GB/s is not hard. 👌

The higher throughput you go, the more absurdly large this discrepancy between network and hardware cost becomes.

For example - this exact setup could probably do 3x the traffic (90MB/s), assuming storage space isn't a concern.

Then you'd have:

• $264,000 a year for the cross-AZ rate. 🥲
• $339,000 a year for the internet rate. 💀

Why is this cost (more than 300k a year) and complexity (this calculation) the case when three laptops can run this practically for free?

Where did we go wrong?

Worth Noting:

There are a few optimizations that can be done here:

• consumers can use fetch from follower, which results in free read traffic (no cross-AZ charges) in the first example. But the second example would still be charged internet costs. 🤝

• you can avoid internet costs by VPC-peering or Private Link-ing the two AWS accounts. This is largely what most cloud providers do, otherwise it becomes prohibitively expensive. It can be super complex to do. 🔧

• AWS can give you large discounts (up to 90%+ afaict) on the quoted prices, depending on your usage. It’s unclear what customer gets what discount. 💰

And perhaps the best example - you can use an ingeniously-designed product like WarpStream that eliminates all of this complexity and cost. ⭐️

It's no wonder they got acquired after just 13 months of operation.Image
I had to edit this because I got the public endpoint traffic cost wrong.

There's surprisingly little info out online about this, and many people seem confused about it.

After latest research, I conclude public endpoints get charged at the usual $0.01/GB rate, I believe on both sides.

It doesn't affect the first part of the calculation, but later on I got some vastly inaccurate numbers.Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Stanislav Kozlovski

Stanislav Kozlovski Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @BdKozlovski

Dec 20, 2024
Kafka is about to get a lot cheaper 🔥

@ivan0yu has published KIP-1123: Rack-aware partitioning for the Kafka Producer. 🎅

Just one week after my calculator post, where I shared the idea of a Produce to Same-AZ Leader KIP!

What a christmas gift! 🎁

🧵 Image
Kafka producers usually write to a particular partition they choose. That partition lives on a random broker.

That broker can be in another AZ.

In the cloud, cross-AZ networking is expensive! 💰

But why do producers choose a particular partition? Image
For ordering guarantees!

This is usually denoted by the key of a record.

If a record has a key, the producer client will deterministically choose the same partition.

It calculates a murmur2 hash and that value is modulo the partition count of the topic. 🤖

The result? Image
Read 7 tweets
Dec 14, 2024
has a vendor ever told you that open source Kafka is expensive?

here are 7 ways cost calculators lie to you

(featuring a real story) Image
the story begins in 2023 when WarpStream was first released with a poignant piece called "Kafka is Dead, long live Kafka"

It was an innovative Kafka-compatible system that avoided inter-zone networking and disks. 💡

It's main value proposition was that it was 10x cheaper. Image
They proved these claims through a cost calculator accesible on their website.

It sounded right!

• the cloud is super expensive
• Kafka racks up a ton of network costs
• Kafka without Tiered Storage uses a ton of disk & that isn’t cheap either.

So it must add up! Image
Read 45 tweets
Dec 4, 2024
Yesterday, AWS shook the data lake world by releasing two new S3 features that will forever cement its place there. 👑

Every data engineer must become familiar with them.

A short thread on these game-changers 🧵 (2 minute read)
AWS announced two major S3 features:

• S3 Tables
• S3 Metadata

Together, they form a very strong solidification of S3’s already integral role in the modern data lake.

But a lot of the first takes I saw on the internet are misunderstanding what this is. 🙄
AWS is deploying a warship in the Open Table Formats war…

... and that ship is taking direct shots at Snowflake and Databricks ⚔️ Image
Read 22 tweets
Nov 2, 2024
An expensive Kafka cluster sells for $1M.

Cheap Kafka sells for … $220M

The story of how Confluent acquired WarpStream after just 13 months of operations 👇 Image
In August of 2023, WarpStream shook up the Kafka industry by announcing a novel Kafka-API compatible cloud-native implementation that used no disks.

Instead? It used S3. 🧠

It was a viral HackerNews post named “Kafka is Dead, Long Live Kafka!”. Image
Just a year later - on September 9, 2024 - Confluent acquired them for $220M.

Why did they do that?

WarpStream’s innovative architecture gave them two major advantages that nobody could compete with:

•massive cost savings 💰
• massive operational simplification 👌 Image
Read 17 tweets
Aug 13, 2024
Uber spends $2B on R&D annually. 🤯

Which open source technologies do they choose and what do they use them for?

How Uber processes 100s of GB/s 🧵
Uber is the poster child of hypergrowth.

It was founded in the ZIRP era (2010)

5 years later - it completed 1 billion trips.

2.5 years after - it completes 10 BILLION trips 🔥

Exponential growth, also reflected in its data infrastructure:

Hadoop grew 1000x in 7 years. 🤯 Image
The tricky part with such growth isn't just the sheer numbers.

The requirements proliferate just as much!

• Data - total data volume is exponentially growing.

• Use Cases - new use cases pop up with competing requirements.

• Users - diverse users with very diff tech skills Image
Read 8 tweets
Jul 30, 2024
Apache Kafka 3.8.0 was just released! 🔥

What comes with this new release?

Here are the top features you should know about:

1/9 🧵 (2-minute read) Image
2/9

KIP-390: Support Compression Level allows you to configure the compression level of each supported algorithm.

Used for both broker-side & producer-side compression.

It can increase performance substantially!


Image
3/9

Last release, Kafka introduced a docker image.

This release, a GraalVM image.

GraalVM is a new JDK that's famous for its low resource usage and fast startup times.

The advantage?

• sub-second startup time 🔥
• minimal memory usage 📌

You can now integrate it into CI! Image
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(