A short ๐งต on @apachekafka topic creation (triggered by @niko_nava, thanks!): who should create Kafka topics, how to make sure they have the right settings, how to avoid dependencies between producer and consumer(s)? Here's my take:
2โฃ Don't use broker-side topic auto-creation! You'll lack fine-grained control over different settings for different topics; Merely polling, or requesting metadata, will trigger creation based on global settings. Plus, some cloud services don't expose auto-creation to begin with.
3โฃ Instead, the producer side should be in charge of creating topics. There you have the information and knowledge about the required settings (replication factor, no. of partitions, retention policy, etc.) for each topic. Depending on your requirements, different approaches...
4โฃ ...work best. If topics are written to by a single producer, roll-out of the producer might also set up / configure the corresponding topics, akin to DB migrations. If you have multiple producers writing to a single topic, or just need more governance in general,...
5โฃ ...upfront coordination via some more formalized process, and topic management through a centralized service, may make sense (depending on your org size, that's a good idea anyways, so to keep track of topic owners, configuration history, etc.).
6โฃ Side-note: auto-creation by Kafka Connect source connectors is an interesting option. KIP-158, added in Kafka 2.6, allows for individual settings (replication, partitions, retention etc.) based on topic name patterns. Details in this post by @rk3rn3r: debezium.io/blog/2020/09/1โฆ
7โฃ How to avoid issues if consumers get rolled out before a topic exists? Don't rely on start-up order -- resilience is key here: e.g. have health checks for Kafka Streams apps to get them automatically restarted. If needed, consumers embedded into a web app may use the...
8โฃ ...admin client API to check the existence of topics, and take some action if they are (or are not yet) available.
Any other tips to add? How do you manage the creation of your Kafka topics? Any best practices to share? Would love to learn from your experiences ๐.
~~Fin ~~
โข โข โข
Missing some Tweet in this thread? You can try to
force a refresh
Agreed, the term is sub-par. But hear me out, the architecture is not. Let's talk about a few common misconceptions about Serverless!
1โฃ "Serverless means no servers"
There *are* servers involved, but it's not on you to run and operate them. Instead, the serverless provider is managing the platform, scaling things up (and down) as needed. Less things to take care of, billed per-use.
Myth: BUSTED!
2โฃ "Serverless is cheaper"
Pay-per-use makes low/medium-volume workloads really cheap. But pricing is complex: no. of requests, assigned RAM/CPU, API gateways, traffic etc. Depending on your workload (e.g. high, sustained), other options like VMs are better.
Thanks for all votes and insightful answers to the poll on usage of @java's var! Not unexpectly, replies range from "using var all the time" to "don't see the point of it". Yet one third never using var at all was a surprise for me. Some repeating themes from replies in this ๐งต.
1โฃ Readability vs. writability: some argued var optimizes for writing code (less characters to type) at the cost of reading code (less explicit type info). I don't think that's the intention behind var. In fact, more (redundant, repetitive) code may read worse.
2โฃ var only or primarily used in test code: described by several folks as a good starting point for getting their feet wet with local variable type inference, before using it more widely.
Message transformations (SMTs) are an invaluable feature of @ApacheKafka Connect, enabling tons of use cases with a small bit of coding, or even just configuration of existing SMTs ready to use. Here are some applications in the context of change data capture: (1/7)
* Converting data types and formats: date/time formats are the most common example here, e.g. to convert milli-seconds timestamps into strings adhering to a specific date format (2/7)
* Creating an "anti-corruption layer", shielding consumers from legacy schemas or ensuring compatibility after schema changes; e.g. could use an SMT to choose more meaningful field names, or re-add a field using its old name after a column rename, easing consumer migration (3/7)
Some folks wonder whether @ApacheKafka is "worth it at their scale". But solely focusing on message count and through-put means to miss out on many other interesting characteristics of Kafka. Here here are just three which make it useful for all kinds of deployments (1/5):
* Fault-tolerance and high availability; topics can be replicated, consumers can fail-over -- Machines will fail, programs will crash, being able to mitigate this is always of value, no matter the scale of an application (2/5)
* Messages can be retained for a potentially indefinite time, consumers are in full control from where they read a topic -- Comes in handy to re-process some messages or entire topics, e.g. after failures, or for bringing in new consumers of existing messages (3/5)