⏱️ Just ten more days until the release of @java 17, the next version with long-term support! To shorten the waiting time a bit, I'll do one tweet per day on a cool feature added since 11 (previous LTS), introducing just some of the changes making worth the upgrade. Let's go 🚀!
🔟 Ambigous null pointer exceptions were a true annoyance in the past. Not a problem any longer since Java 14: Helpful NPEs (JEP 358, openjdk.java.net/jeps/358) now exactly show which variable is null. A very nice improvement to #OpenJDK, previously available only in SAP's JVM.
9⃣ Varying load and new app instances must be started up quickly? Check out class-data sharing (CDS), whose dev exp has improved a lot with JEP 350 (Dynamic CDS Archives, Java 13); also way more classes are archiveable since Java 15. More details here: morling.dev/blog/smaller-f…
8⃣ Adding JSON snippets to your Java code, e.g. for tests? Or multi-line SQL queries? Much easier now thanks to text blocks, without any escaping or concatenation. After two preview cycles, text blocks were added as stable language feature in Java 15 (openjdk.java.net/jeps/378).
7⃣ Flight Recorder has changed the game for JVM performance analysis. New since Java 14: JFR event streaming. Either in-process (JEP 349), or out-of-process since Java 16. "health-report", a nice demo of the latter, introduced in this post by @ErikGahlin: egahlin.github.io/2021/05/17/rem…
@ErikGahlin 6⃣ Occasionally, you need to take specific actions depending on the type of a given object -- just one use case for pattern matching. Added in Java 16 via JEP 394, with more kinds of patterns to be supported in the future. Details in this post by @nipafx: nipafx.dev/java-pattern-m….
@ErikGahlin@nipafx 5⃣ Running application and database on the same host? Looking for efficient IPC between the processes of a compartmentalized desktop app? Then check out Unix-Domain Socket Channels (JEP 380), added in Java 16. Discussing several use cases in this post: morling.dev/blog/talking-t…
4⃣ Excited about pattern matching (6⃣)? Then you'll love switch expressions (JEP 361, added in @java 14), and pattern matching for them (brand-new as preview in 17). Super-useful together with sealed classes (finalized in 17). Note how the non-exhaustive switch fails compilation.
3⃣ Vectorization via #SIMD (single instruction, multiple data) can help to significantly speed up certain computations. Now supported in @java (JEP 414, incubating), fully transparent and portable across x64 and AArch64. Even FizzBuzz faster than ever 😜!
@java 2⃣ Elastic Metaspace (JEP 387), ZGC and Shenandoah collectors ready for production (377/379), G1 NUMA support (345), G1 quickly uncommitting unused memory (346, some details here:
) -- Tons of improvements related to GC and memory management since @java 11!
1⃣ Records, oh records! Long-awaited and going through two previews, @java language support for nominal tuples has been finalized in version 16 (JEP 395). Great for immutable data carriers like DTOs. A nice discussion of record semantics here by @nipafx: nipafx.dev/java-record-se….
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). There's some overlap and similarities, but also differences. Here's some aspects which may help 1/10
you to pick the right tool for the job. First, the commonalities: both kinds of tools let you do (potentially stateful) computations on (un-)bounded streams of data, such as click streams, IoT data streams, or CDC feeds from databases: e.g. projecting, filtering, mapping, 2/10
joining, grouping and aggregating, time/session-windowed computations, etc. A key value proposition is to give you deep insight into your live data by incrementally computing derived data views with a very low latency. E.g. think real-time analytics, fraud and anomaly 3/10
🧵 "How does Apache Flink compare to Kafka Streams?"
Both do stream processing, but differ in some important aspects. A few folks asked me about this recently, so I thought I'd share some thoughts. This is from a user's perspective, not touching on implementation details. 1/10
1⃣ Supported Streaming Platforms
Being part of the @apachekafka project, Kafka Streams exclusively supports stream processing of data in Kafka. @ApacheFlink is platform-agnostic and lets you process data in Kafka, AWS Kinesis, Google Cloud Pub/Sub, RabbitMQ, etc. 2/10
2⃣ Deployment Model
Kafka Streams is a library which you embed into your Java (or more generally, JVM-based) application. Flink can be used that way, too, but more typically it is run as a cluster of workers to which you upload your jobs. It comes with a web console for... 3/10
Data tweeps: I'm trying to get an overview about players in the space of incrementally updated materialized database views. That field is absolutely exploding right now, and it's really hard to keep track. Here are the ones I'm aware of 👇:
1⃣ @MaterializeInc (materialize.com): Definitely the most prominent one, Postgres-compatible, based on the Timely/Differential Dataflow algorithms. Business Source License.
🧵 Few things in a developer's life are as annoying as issues with their project's build tool. A build running just fine yesterday is suddenly failing? Your build is just so slooow? A quick thread with some practices I've come to value when using @ASFMavenProject 👇.
1⃣ Make sure the default build (`mvn verify`) passes after a fresh checkout. It's so frustrating to check out a code base and not be able to build it. If special tools need to be installed, have custom enforcer rules (see below) to verify and error out on this eagerly.
2⃣ Pin all dependency and plug-in to specific (non-snapshot) versions. In particular for plug-ins, that often gets forgotten, resulting in potential surprises for instance when using different Maven version.
🧵 If you run @apachekafka in production, creating clusters, topics, connectors etc. by hand is tedious and error-prone. Better rely on declarative configuration which you put into revision control and apply in an automated way, #GitOps-style. Some tools which help with that:
1⃣ JulieOps (github.com/kafka-ops/julie) by @purbon, which helps you to "automate the management of your things within Apache Kafka, from Topics, Configuration to Metadata but as well Access Control, Schemas". A nice intro in this post by Bruno Costa: medium.com/marionete/how-…
2⃣ topicctl (github.com/segmentio/topi…) by @segment: "Easy, declarative management of Kafka topics. Includes the ability to 'apply' topic changes from YAML as well as a repl for interactive exploration of brokers, topics, consumer groups, messages, and more"
Quick 🧵 on what's "Head-of-Line Blocking" in @apachekafka, why it is a problem, and what some mitigation strategies are.
Context: Records in Kafka are written to topic partitions, which are read sequentially by consumers. To parallelize processing, consumers can be organized in
2⃣ groups, with partitions being distributed equally amongst consumer group members.
The problem: if a consumer hits a record which is either slow to process (say, a request to an external system takes a long time while doing so) or can't be processed at all (say, a record with
3⃣ an invalid format), that consumer can't make further progress with this partition. The reason being that consumer offsets aren't committed on a per-message basis, but always up to a specific record. I.e. all further records in that partition are blocked by the one at the head.