1/ First community meeting for Dagster users going through our biggest release yet, 0.8.0. Fantastic meeting with really great questions at end. This was for existing users so we jumped right into advanced stuff.
2/ Will be posting in detail about the release next week. But there are huge internal changes in addition to user-facing features. A little preview of some of the interesting bits below.
3/ Core architecture changes: Our system processes and tools are now totally separate from user code. They no longer share python deps and can even be on different python versions. You can also organize it so that teams can keep their deps/pipelines separate from each other.
4/ Dagit (frontend) revamp: Totally new organization for our frontend that is more "pipeline-centric" You have a landing page for your pipeline that in one screen that displays its shape, its schedule, its run history, and the assets it has produced.
5/ We have a new capability called the Asset Manager, that allows you to link your computations and the assets that they produce. We believe that this is novel system of record for metadata in data applications, and we will invest a ton in this area going forward.
6/ New charting/graphing capabilities to view historical execution times and asset properties.
7/ We also have implemented lineage between runs (e.g. when you re-execute from a failure there are two runs linked together) and display those runs together.
8/ We have also improved our pyspark support to abstract away infrastructure concerns. You can write pyspark code and, without changing it, execute it on your laptop, EMR, and now -- with a community-provided integration -- Databricks. Really excited about this direciton.
9/ "Dagster-Native" Orchestration Cluster: We have a supported orchestration cluster that allows one to manage compute on k8s and using celery as a control plane.
10/ Airflow auto-ingestion: You can now take a set of Airflow DAGs and automatically ingest them into Dagster, using Dagster as an execution environment.
12/ We'll be posting a more thorough blog post next week about these features and the direction of the project.
13/ Thanks to team for all the amazing work for to all the community members out there. The attendance at the meeting exceeded our expectations and the questions were great!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/ Last week, @s_ryz published a blog about how you can use Dagster to dramatically improve the developer lifecycle and increase productivity for pyspark developers creating production workflows. dagster.io/blog/pyspark
2/ Amongst PySpark developers it’s quite common to develop directly against their deployment environment: Databricks, EMR, Dataproc, etc with little tooling support. The resulting workflow can be very painful.
This is not atypical:
- Fire up Spark cluster. Launch my job.
- Wait 10 minutes.
- Click through multiple UIs. Whoops. Forgot to push my updated code.
- Push updated code to the cluster.
- Launch job. Wait another 10 minutes.
- Discover this error:
3/ @s_ryz discussed our 0.10.0. Versioning our computations and produced assets allowing fully incremental compute during development and backfilling will be a total gamechanger. Lots of other goodies here. docs.google.com/presentation/d…
1/ Dagster has been public over a year. Last week we pushed out a new version that marks a new level of maturity for the project. We now call Dagster a data orchestrator. Here is a post about what we’ve built, learned, and principles we've developed:
2/ Over the past decade, there have been huge advances in data technology. Advanced computational runtimes and cloud data warehouses built on infinite, cheap storage and elastic compute are available to any organization with the right tools and sufficient resources.
3/ We believe the primary challenges today are higher in the stack and are abstraction and tooling problems. Data computations are hard to test, slow to build, disorganized, and under-abstracted. Uncontrolled complexity is the norm. Their needs are not met by the ecosystem today.
This year students are facing the prospect of returning to a degraded college experience while still paying full tuition. We at Elementl suggest an alternative: a well-paid, year-long fellowship where you work on an open source project, Dagster.
2/ The model already works really well. Just look at Waterloo, one of the best engineering schools in the world. With this you can make your own Waterloo experience. You can work from anywhere and would be treated as a full-time employee with commensurate pay and benefits.
3/ We're not encouraging people to drop out. We think of this like gap year of applied learning.
Happy to announced that the @dagsterio team has pushed out our latest major rev, 0.7.0. In the last six months we've moved from a tool suitable for local development, to a hostable one for smaller pipelines, to one for large scale pipelines in modern infra.
First: a reskin and new navigation scheme for our frontend, Dagit. We think that Dagit sets a new standard for frontend in data tools. We also have dramatically improved rendering perf for large pipelines (1000s of nodes) along with the ability to subselect with a condense syntax
We also have a gorgeous new execution viewer for local development and production ops. It allows you a quickly navigate to running and failed tasks, filter down to upstream and downstream tasks only, and view our structured logs. It is also just really fun.
California is under siege by horrific fires. Both my wife @lesliejz and I have felt helpless witnessing heroic firefighters fight this battle. They deserve our support. We're starting a @gofundme to raise critical funds and will be matching donations. gf.me/u/wdzjam
Sonoma County continues to be ground zero for many of these catastrophic fires, such as the 2017 Tubbs Fire and 2019 Kincaid Fire. And the firefighters in those very areas are critically underfunded.
The Sonoma County Fire District serves an area near Calistoga/Windsor and is in financial dire straits. They are funded by property taxes. This property is literally being burned away. In the 2017 Tubbs fire alone, **25%** of the property tax base in the area was destroyed