1. Here’s a summary of my #kafkasummit talk for those who missed the live stream with links and pics of all the things we announced!
2. This was probably the most new open source stuff, cool at-scale cloud details, and new Confluent features I’ve ever had in a keynote. But it was too exciting to leave anything out.
3. I gave an overview of the stack that is emerging around Kafka, the Data Streaming Platform. There are a couple of components to this. There’s Kafka of course, but increasingly the key other layers are the connectors, stream processing, and tools to govern streaming data.
4. Kafka is taking off and there is exciting innovation happening in open source and commercial products at each layer of this stack. I'll covered some of the open source work and some new product announcements from Confluent.
5. First, the core stream itself, Apache Kafka. As usual with open source the work is done by a broad base of committers from dozens of companies with a strong commitment to making Kafka succeed.
6. There is a rich roadmap of features including the recent work on Zookeeper-free Kafka, the upcoming tiered storage work, and the new work on Queues for Kafka.
7. Queuing in Kafka is one of the more exciting bits and there and those wanting to know more can check out KIP-932 for more details on the proposal. cwiki.apache.org/confluence/dis…
8. I also covered a bit about the internals of the Kafka service offered in Confluent Cloud and gave an in depth walk through on Kora, the core engine that runs our cloud.
9. Some of the advantages of Kora include 30x improvements in elasticity, 10x advantage in resilience, scalable infinite storage, substantial improvements in performance, and the cost structure that enables our Cost Challenge confluent.io/en-gb/blog/und…
10. There is more to say there than fits in twitter, so I’ll link out to the longer blog I did on Kora. confluent.io/en-gb/blog/clo…
11. Okay that is the stream layer, now let’s move up the stack a bit and talk about the rest of the data streaming platform.
12. The first announcement is Custom Connectors in Confluent Cloud. This means that in addition to the 70+ fully managed connectors we offer, you can now bring any Kafka Connector and run it in Confluent Cloud.
13. Next is Data Quality Rules in Stream Governance, these let you check richer assertions beyond just syntactic correctness in your data.
14. Finally the big one: Flink!!! This is probably the biggest and most important product effort at Confluent since we launched Confluent Cloud. I could not be more excited.
15. Why is this so important? I think stream processing has a similar role with streaming data that databases have with stored data---they make building applications easier. And Flink is in a leadership position in stream processing.
16. Today we are opening up early access to our fully managed Flink SQL offering. We have an exciting roadmap beyond that:
17. Why Flink? Well it’s a combination an amazing community and a fantastic platform.
18. What makes our Flink offering special? Two dimensions I’ll highlight. First, it’s truly cloud native.
19. Second it’s fully integrated with the rest of Confluent Cloud so all the parts of the Data Streaming Platform natively talk to each other--Kafka, connectors, Flink and Governance all work together. If you create a topic, it shows up automatically in Flink for SQL queries.
20. We still early in this effort but I couldn’t be more excited about the team, the progress so far, and where it is heading.
21. Finally one last thing! All of this is about sharing data within an organization. But often companies need to share beyond their own walls. Can we make that equally easy? We can, that is where Stream Sharing comes in.
22. Okay so all these announcements may leave you wanting more details. This blog and the recording of my talk both a bit more depth and we’ll have deep dive blogs and online sessions on many of these in the weeks ahead.confluent.io/en-gb/blog/str…kafka-summit.org/events/kafka-s…
23. Whew! That’s a lot. Huge thanks for the team at Confluent and the larger open source community for sustaining the incredible pace of innovation. Big things are happening in streaming!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1. The two-phase commit proposal for @apachekafka (KIP-939) is pretty interesting. Quick thread on why it matters. cwiki.apache.org/confluence/dis…
2. The first use case was actually to make @apachekafka and @ApacheFlink
work better together. That was one driver for @confluentinc to work on it right now (we have a Flink service coming!). But the applicability is much broader.
3. There's a very general problem in integrating events into apps: how do you keep your events and DB state in sync? Example: a user joins your service. You want to update some DB tables and publish a “user joined” event to Kafka to let the rest of the org react, what do you do?
1. Thoughtworks notes that "Kafka continues toward its status as a de facto standard." noting that Kubernetes, Kafka, and the CSPs are becoming stable layers in the next gen stack and churn around alternative platforms seems to have waned. thoughtworks.com/content/dam/th…
2. This matches our internal data as well. Measuring open source usage is pretty hard, but our best data is that Kafka adoption is growing 7x faster than the fastest growing alternative off a base that is more than 15x the scale.
3. It's interesting to reflect on why data streaming has trended towards consolidation while other areas have trended towards greater diversity (there is always room for another database). I think there are three reasons.
1. A quick reflection on Confluent's IPO today and the journey so far (a thread!).
2. We wrote the initial Kafka code base at LinkedIn in 2009-2010. In 2011 we released the initial Kafka code as open source to...resounding silence. No one cared!
3. We had a lot of big ideas: building a data architecture around events, moving from batch to real-time stream processing, doing this around a kind of commit log that brought together real-time change and data storage. We knew it could be a big deal!
1/ In April we at @confluentinc kicked off what we call Project Metamorphosis, which is all about building a real cloud-native service around Kafka and it's ecosystem. I talked about why I think this is a big deal in my Kafka Summit Keynote today. Here's a twitter summary:
2/ My talk's central thesis - There are two major trends that so far have been largely disjoint: cloud-native data systems and event streaming, and these need to converge. What do I mean by that?
3/ We think Kafka and event streams are on a path to take on a major role as a kind of central nervous system in a modern company, and this represents the rise of a major new paradigm for working with data.
1/ Faust is a python library from for stream processing with @apachekafka from @RobinhoodApp. I think it's really cool. It highlights one of the things I think we got right with Kafka Streams: supporting stream processing in Kafka at the protocol level. github.com/robinhood/faust
2/ This means having a model in Kafka's core protocol for elastic scalability, partitioning, stateful processing, and transactionally correct processing that covers both input, output, and state changes but is decoupled from any implementation of code that does this processing.
3/ This functionality is all part of core Kafka and supported in the consumer/producer APIs (albeit in a low-level way).
Kafka Streams is really just a Java library that uses this protocol and gives reusable operators, but our hope is that will come to exist in every language.