Post

@MaterializeInc

@CashApp

@risingwave

@readysetio

@leap_db

@ApacheFlink

More from @gunnarmorling

Gunnar Morling 🌍

@gunnarmorling

May 3, 2023

Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). There's some overlap and similarities, but also differences. Here's some aspects which may help 1/10

you to pick the right tool for the job. First, the commonalities: both kinds of tools let you do (potentially stateful) computations on (un-)bounded streams of data, such as click streams, IoT data streams, or CDC feeds from databases: e.g. projecting, filtering, mapping, 2/10

joining, grouping and aggregating, time/session-windowed computations, etc. A key value proposition is to give you deep insight into your live data by incrementally computing derived data views with a very low latency. E.g. think real-time analytics, fraud and anomaly 3/10

Read 10 tweets

Gunnar Morling 🌍

@gunnarmorling

Dec 23, 2022

🧵 "How does Apache Flink compare to Kafka Streams?"

Both do stream processing, but differ in some important aspects. A few folks asked me about this recently, so I thought I'd share some thoughts. This is from a user's perspective, not touching on implementation details. 1/10

@apachekafka

1⃣ Supported Streaming Platforms

Being part of the @apachekafka project, Kafka Streams exclusively supports stream processing of data in Kafka. @ApacheFlink is platform-agnostic and lets you process data in Kafka, AWS Kinesis, Google Cloud Pub/Sub, RabbitMQ, etc. 2/10

2⃣ Deployment Model

Kafka Streams is a library which you embed into your Java (or more generally, JVM-based) application. Flink can be used that way, too, but more typically it is run as a cluster of workers to which you upload your jobs. It comes with a web console for... 3/10

Read 10 tweets

Gunnar Morling 🌍

@gunnarmorling

Jul 25, 2022

@ASFMavenProject

🧵 Few things in a developer's life are as annoying as issues with their project's build tool. A build running just fine yesterday is suddenly failing? Your build is just so slooow? A quick thread with some practices I've come to value when using @ASFMavenProject 👇.

1⃣ Make sure the default build (`mvn verify`) passes after a fresh checkout. It's so frustrating to check out a code base and not be able to build it. If special tools need to be installed, have custom enforcer rules (see below) to verify and error out on this eagerly.

2⃣ Pin all dependency and plug-in to specific (non-snapshot) versions. In particular for plug-ins, that often gets forgotten, resulting in potential surprises for instance when using different Maven version.

Read 12 tweets

Gunnar Morling 🌍

@gunnarmorling

Jul 5, 2022

@apachekafka

🧵 If you run @apachekafka in production, creating clusters, topics, connectors etc. by hand is tedious and error-prone. Better rely on declarative configuration which you put into revision control and apply in an automated way, #GitOps-style. Some tools which help with that:

@purbon

1⃣ JulieOps (github.com/kafka-ops/julie) by @purbon, which helps you to "automate the management of your things within Apache Kafka, from Topics, Configuration to Metadata but as well Access Control, Schemas". A nice intro in this post by Bruno Costa: medium.com/marionete/how-…

@segment

2⃣ topicctl (github.com/segmentio/topi…) by @segment: "Easy, declarative management of Kafka topics. Includes the ability to 'apply' topic changes from YAML as well as a repl for interactive exploration of brokers, topics, consumer groups, messages, and more"

Read 7 tweets

Gunnar Morling 🌍

@gunnarmorling

Jun 28, 2022

@apachekafka

Quick 🧵 on what's "Head-of-Line Blocking" in @apachekafka, why it is a problem, and what some mitigation strategies are.

Context: Records in Kafka are written to topic partitions, which are read sequentially by consumers. To parallelize processing, consumers can be organized in

2⃣ groups, with partitions being distributed equally amongst consumer group members.

The problem: if a consumer hits a record which is either slow to process (say, a request to an external system takes a long time while doing so) or can't be processed at all (say, a record with

3⃣ an invalid format), that consumer can't make further progress with this partition. The reason being that consumer offsets aren't committed on a per-message basis, but always up to a specific record. I.e. all further records in that partition are blocked by the one at the head.

Read 10 tweets

Gunnar Morling 🌍

@gunnarmorling

Apr 5, 2022

@debezium

👋 Hey students, the JBoss community is part of #GoogleSummerOfCode, and @debezium is looking forward to your project proposals! Some ideas at spaces.redhat.com/display/GSOC/G… (e.g. a Debezium JDBC sink connector, ZOMG 🚀).

Interested? Get in touch via email: groups.google.com/g/debezium

Project idea 1⃣: A stand-alone tool for compacting the schema history topic of Debezium connectors, allowing for faster start-up of connectors with large histories.

spaces.redhat.com/display/GSOC/G…

Project idea 2⃣: Porting the Debezium Cassandra connector to Debezium Server, allowing for a unified user experience across all the different connectors.

spaces.redhat.com/display/GSOC/G…

Read 5 tweets

Share this page!

Enter URL or ID to Unroll

Gunnar Morling 🌍

Try unrolling a thread yourself!

More from @gunnarmorling

Gunnar Morling 🌍

Gunnar Morling 🌍

Gunnar Morling 🌍

Gunnar Morling 🌍

Gunnar Morling 🌍

Gunnar Morling 🌍

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!