Gunnar Morling 🌍 Profile picture
Sep 1, 2022 9 tweets 5 min read Read on X
Data tweeps: I'm trying to get an overview about players in the space of incrementally updated materialized database views. That field is absolutely exploding right now, and it's really hard to keep track. Here are the ones I'm aware of 👇:
1⃣ @MaterializeInc (materialize.com): Definitely the most prominent one, Postgres-compatible, based on the Timely/Differential Dataflow algorithms. Business Source License.
2⃣ #PranaDB (github.com/cashapp/pranadb); created by @CashApp, "designed from the outset to be horizontally scalable", Apache v2 License.
3⃣ @risingwave (risingwave.dev); also Postgres-compatible. Apache v2 License.
4⃣ @readysetio (readyset.io); specifically targeting caching use cases, but it's also incremental view materialization (based on Noria). Business Source License.
5⃣ @leap_db (leapdb.com). MySQL-compatible. Not quite clear on the license, seems to be SaaS exclusively?
6⃣ pgsql-ivm (github.com/sraoss/pgsql-i…); an extension for incremental view maintenance within Postgres itself. May become part of PG proper some day. Not clear on the license, I suppose PostgreSQL License?
7⃣ Besides all these above which are positioned as databases, there's multiple streaming SQL solutions, but I think it's a separate solution space, e.g. with @ApacheFlink SQL (e.g. via @Decodableco), @ksqlDB, and @DeltaStreamInc.
Those are the ones I'm aware of right now; would love to learn about other view mat you may know. Would be cool (but tons of work) to have a blog post with a thorough comparison, e.g. exploring the specific query capabilities and consistency guarantees. One day, perhaps :)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Gunnar Morling 🌍

Gunnar Morling 🌍 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @gunnarmorling

May 3, 2023
Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). There's some overlap and similarities, but also differences. Here's some aspects which may help 1/10
you to pick the right tool for the job. First, the commonalities: both kinds of tools let you do (potentially stateful) computations on (un-)bounded streams of data, such as click streams, IoT data streams, or CDC feeds from databases: e.g. projecting, filtering, mapping, 2/10
joining, grouping and aggregating, time/session-windowed computations, etc. A key value proposition is to give you deep insight into your live data by incrementally computing derived data views with a very low latency. E.g. think real-time analytics, fraud and anomaly 3/10
Read 10 tweets
Dec 23, 2022
🧵 "How does Apache Flink compare to Kafka Streams?"

Both do stream processing, but differ in some important aspects. A few folks asked me about this recently, so I thought I'd share some thoughts. This is from a user's perspective, not touching on implementation details. 1/10
1⃣ Supported Streaming Platforms

Being part of the @apachekafka project, Kafka Streams exclusively supports stream processing of data in Kafka. @ApacheFlink is platform-agnostic and lets you process data in Kafka, AWS Kinesis, Google Cloud Pub/Sub, RabbitMQ, etc. 2/10
2⃣ Deployment Model

Kafka Streams is a library which you embed into your Java (or more generally, JVM-based) application. Flink can be used that way, too, but more typically it is run as a cluster of workers to which you upload your jobs. It comes with a web console for... 3/10
Read 10 tweets
Jul 25, 2022
🧵 Few things in a developer's life are as annoying as issues with their project's build tool. A build running just fine yesterday is suddenly failing? Your build is just so slooow? A quick thread with some practices I've come to value when using @ASFMavenProject 👇.
1⃣ Make sure the default build (`mvn verify`) passes after a fresh checkout. It's so frustrating to check out a code base and not be able to build it. If special tools need to be installed, have custom enforcer rules (see below) to verify and error out on this eagerly.
2⃣ Pin all dependency and plug-in to specific (non-snapshot) versions. In particular for plug-ins, that often gets forgotten, resulting in potential surprises for instance when using different Maven version.
Read 12 tweets
Jul 5, 2022
🧵 If you run @apachekafka in production, creating clusters, topics, connectors etc. by hand is tedious and error-prone. Better rely on declarative configuration which you put into revision control and apply in an automated way, #GitOps-style. Some tools which help with that:
1⃣ JulieOps (github.com/kafka-ops/julie) by @purbon, which helps you to "automate the management of your things within Apache Kafka, from Topics, Configuration to Metadata but as well Access Control, Schemas". A nice intro in this post by Bruno Costa: medium.com/marionete/how-…
2⃣ topicctl (github.com/segmentio/topi…) by @segment: "Easy, declarative management of Kafka topics. Includes the ability to 'apply' topic changes from YAML as well as a repl for interactive exploration of brokers, topics, consumer groups, messages, and more"
Read 7 tweets
Jun 28, 2022
Quick 🧵 on what's "Head-of-Line Blocking" in @apachekafka, why it is a problem, and what some mitigation strategies are.

Context: Records in Kafka are written to topic partitions, which are read sequentially by consumers. To parallelize processing, consumers can be organized in Image
2⃣ groups, with partitions being distributed equally amongst consumer group members.

The problem: if a consumer hits a record which is either slow to process (say, a request to an external system takes a long time while doing so) or can't be processed at all (say, a record with
3⃣ an invalid format), that consumer can't make further progress with this partition. The reason being that consumer offsets aren't committed on a per-message basis, but always up to a specific record. I.e. all further records in that partition are blocked by the one at the head.
Read 10 tweets
Apr 5, 2022
👋 Hey students, the JBoss community is part of #GoogleSummerOfCode, and @debezium is looking forward to your project proposals! Some ideas at spaces.redhat.com/display/GSOC/G… (e.g. a Debezium JDBC sink connector, ZOMG 🚀).

Interested? Get in touch via email: groups.google.com/g/debezium
Project idea 1⃣: A stand-alone tool for compacting the schema history topic of Debezium connectors, allowing for faster start-up of connectors with large histories.

spaces.redhat.com/display/GSOC/G…
Project idea 2⃣: Porting the Debezium Cassandra connector to Debezium Server, allowing for a unified user experience across all the different connectors.

spaces.redhat.com/display/GSOC/G…
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(