Gunnar Morling 🌍 Profile picture
Software engineer @Decodableco · Ex-lead of Debezium · Spec lead of Bean Validation 2.0 · Creator of JfrUnit, kcctl and MapStruct · Java Champion · 🚴
mark Profile picture Raghav Profile picture 3 subscribed
May 3, 2023 β€’ 10 tweets β€’ 2 min read
Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). There's some overlap and similarities, but also differences. Here's some aspects which may help 1/10 you to pick the right tool for the job. First, the commonalities: both kinds of tools let you do (potentially stateful) computations on (un-)bounded streams of data, such as click streams, IoT data streams, or CDC feeds from databases: e.g. projecting, filtering, mapping, 2/10
Dec 23, 2022 β€’ 10 tweets β€’ 3 min read
🧡 "How does Apache Flink compare to Kafka Streams?"

Both do stream processing, but differ in some important aspects. A few folks asked me about this recently, so I thought I'd share some thoughts. This is from a user's perspective, not touching on implementation details. 1/10 1⃣ Supported Streaming Platforms

Being part of the @apachekafka project, Kafka Streams exclusively supports stream processing of data in Kafka. @ApacheFlink is platform-agnostic and lets you process data in Kafka, AWS Kinesis, Google Cloud Pub/Sub, RabbitMQ, etc. 2/10
Sep 1, 2022 β€’ 9 tweets β€’ 5 min read
Data tweeps: I'm trying to get an overview about players in the space of incrementally updated materialized database views. That field is absolutely exploding right now, and it's really hard to keep track. Here are the ones I'm aware of πŸ‘‡: 1⃣ @MaterializeInc (materialize.com): Definitely the most prominent one, Postgres-compatible, based on the Timely/Differential Dataflow algorithms. Business Source License.
Jul 25, 2022 β€’ 12 tweets β€’ 3 min read
🧡 Few things in a developer's life are as annoying as issues with their project's build tool. A build running just fine yesterday is suddenly failing? Your build is just so slooow? A quick thread with some practices I've come to value when using @ASFMavenProject πŸ‘‡. 1⃣ Make sure the default build (`mvn verify`) passes after a fresh checkout. It's so frustrating to check out a code base and not be able to build it. If special tools need to be installed, have custom enforcer rules (see below) to verify and error out on this eagerly.
Jul 5, 2022 β€’ 7 tweets β€’ 5 min read
🧡 If you run @apachekafka in production, creating clusters, topics, connectors etc. by hand is tedious and error-prone. Better rely on declarative configuration which you put into revision control and apply in an automated way, #GitOps-style. Some tools which help with that: 1⃣ JulieOps (github.com/kafka-ops/julie) by @purbon, which helps you to "automate the management of your things within Apache Kafka, from Topics, Configuration to Metadata but as well Access Control, Schemas". A nice intro in this post by Bruno Costa: medium.com/marionete/how-…
Jun 28, 2022 β€’ 10 tweets β€’ 3 min read
Quick 🧡 on what's "Head-of-Line Blocking" in @apachekafka, why it is a problem, and what some mitigation strategies are.

Context: Records in Kafka are written to topic partitions, which are read sequentially by consumers. To parallelize processing, consumers can be organized in Image 2⃣ groups, with partitions being distributed equally amongst consumer group members.

The problem: if a consumer hits a record which is either slow to process (say, a request to an external system takes a long time while doing so) or can't be processed at all (say, a record with
Apr 5, 2022 β€’ 5 tweets β€’ 3 min read
πŸ‘‹ Hey students, the JBoss community is part of #GoogleSummerOfCode, and @debezium is looking forward to your project proposals! Some ideas at spaces.redhat.com/display/GSOC/G… (e.g. a Debezium JDBC sink connector, ZOMG πŸš€).

Interested? Get in touch via email: groups.google.com/g/debezium Project idea 1⃣: A stand-alone tool for compacting the schema history topic of Debezium connectors, allowing for faster start-up of connectors with large histories.

spaces.redhat.com/display/GSOC/G…
Sep 5, 2021 β€’ 11 tweets β€’ 9 min read
⏱️ Just ten more days until the release of @java 17, the next version with long-term support! To shorten the waiting time a bit, I'll do one tweet per day on a cool feature added since 11 (previous LTS), introducing just some of the changes making worth the upgrade. Let's go πŸš€! πŸ”Ÿ Ambigous null pointer exceptions were a true annoyance in the past. Not a problem any longer since Java 14: Helpful NPEs (JEP 358, openjdk.java.net/jeps/358) now exactly show which variable is null. A very nice improvement to #OpenJDK, previously available only in SAP's JVM. Image
Aug 9, 2021 β€’ 4 tweets β€’ 4 min read
#Postgres as an event store -- Thanks a lot for all the super-insightful answers πŸ™! It looks like using a jsonb[] for modeling an event stream isn't ideal performance-wise, but several great pointers to using #Postgres for event sourcing here. Mentioned solutions include... 1/4 - FactCast: docs.factcast.org
- Message DB: github.com/message-db/mes…
- @marten_lib: martendb.io
- @axonframework: axoniq.io
- crabzilla: github.com/crabzilla/crab… (an event sourcing exploration using @vertx_project)

2/4
Dec 3, 2020 β€’ 8 tweets β€’ 3 min read
A short 🧡 on @apachekafka topic creation (triggered by @niko_nava, thanks!): who should create Kafka topics, how to make sure they have the right settings, how to avoid dependencies between producer and consumer(s)? Here's my take: 2⃣ Don't use broker-side topic auto-creation! You'll lack fine-grained control over different settings for different topics; Merely polling, or requesting metadata, will trigger creation based on global settings. Plus, some cloud services don't expose auto-creation to begin with.
Jul 31, 2020 β€’ 8 tweets β€’ 3 min read
⏰ What time is it? Time for a #Mythbusters thread -- #Serverless edition!

Agreed, the term is sub-par. But hear me out, the architecture is not. Let's talk about a few common misconceptions about Serverless! 1⃣ "Serverless means no servers"

There *are* servers involved, but it's not on you to run and operate them. Instead, the serverless provider is managing the platform, scaling things up (and down) as needed. Less things to take care of, billed per-use.

Myth: BUSTED!
May 10, 2020 β€’ 7 tweets β€’ 3 min read
Thanks for all votes and insightful answers to the poll on usage of @java's var! Not unexpectly, replies range from "using var all the time" to "don't see the point of it". Yet one third never using var at all was a surprise for me. Some repeating themes from replies in this 🧡. 1⃣ Readability vs. writability: some argued var optimizes for writing code (less characters to type) at the cost of reading code (less explicit type info). I don't think that's the intention behind var. In fact, more (redundant, repetitive) code may read worse.
Aug 28, 2019 β€’ 7 tweets β€’ 3 min read
Message transformations (SMTs) are an invaluable feature of @ApacheKafka Connect, enabling tons of use cases with a small bit of coding, or even just configuration of existing SMTs ready to use. Here are some applications in the context of change data capture: (1/7) * Converting data types and formats: date/time formats are the most common example here, e.g. to convert milli-seconds timestamps into strings adhering to a specific date format (2/7)
Jul 31, 2019 β€’ 5 tweets β€’ 1 min read
Some folks wonder whether @ApacheKafka is "worth it at their scale". But solely focusing on message count and through-put means to miss out on many other interesting characteristics of Kafka. Here here are just three which make it useful for all kinds of deployments (1/5): * Fault-tolerance and high availability; topics can be replicated, consumers can fail-over -- Machines will fail, programs will crash, being able to mitigate this is always of value, no matter the scale of an application (2/5)