Nick Schrock Profile picture
Jun 16, 2020 13 tweets 3 min read Read on X
1/ First community meeting for Dagster users going through our biggest release yet, 0.8.0. Fantastic meeting with really great questions at end. This was for existing users so we jumped right into advanced stuff.
2/ Will be posting in detail about the release next week. But there are huge internal changes in addition to user-facing features. A little preview of some of the interesting bits below.
3/ Core architecture changes: Our system processes and tools are now totally separate from user code. They no longer share python deps and can even be on different python versions. You can also organize it so that teams can keep their deps/pipelines separate from each other.
4/ Dagit (frontend) revamp: Totally new organization for our frontend that is more "pipeline-centric" You have a landing page for your pipeline that in one screen that displays its shape, its schedule, its run history, and the assets it has produced. Image
5/ We have a new capability called the Asset Manager, that allows you to link your computations and the assets that they produce. We believe that this is novel system of record for metadata in data applications, and we will invest a ton in this area going forward. Image
6/ New charting/graphing capabilities to view historical execution times and asset properties. Image
7/ We also have implemented lineage between runs (e.g. when you re-execute from a failure there are two runs linked together) and display those runs together. Image
8/ We have also improved our pyspark support to abstract away infrastructure concerns. You can write pyspark code and, without changing it, execute it on your laptop, EMR, and now -- with a community-provided integration -- Databricks. Really excited about this direciton.
9/ "Dagster-Native" Orchestration Cluster: We have a supported orchestration cluster that allows one to manage compute on k8s and using celery as a control plane.
10/ Airflow auto-ingestion: You can now take a set of Airflow DAGs and automatically ingest them into Dagster, using Dagster as an execution environment.
11/ You can read full release notes here: github.com/dagster-io/dag…. There's a lot of stuff in there.
12/ We'll be posting a more thorough blog post next week about these features and the direction of the project.
13/ Thanks to team for all the amazing work for to all the community members out there. The attendance at the meeting exceeded our expectations and the questions were great!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Nick Schrock

Nick Schrock Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @schrockn

Sep 24, 2020
1/ Last week, @s_ryz published a blog about how you can use Dagster to dramatically improve the developer lifecycle and increase productivity for pyspark developers creating production workflows. dagster.io/blog/pyspark
2/ Amongst PySpark developers it’s quite common to develop directly against their deployment environment: Databricks, EMR, Dataproc, etc with little tooling support. The resulting workflow can be very painful.
This is not atypical:

- Fire up Spark cluster. Launch my job.
- Wait 10 minutes.
- Click through multiple UIs. Whoops. Forgot to push my updated code.
- Push updated code to the cluster.
- Launch job. Wait another 10 minutes.
- Discover this error:
Read 11 tweets
Sep 11, 2020
1/ We had our first @dagsterio community meeting on Tuesday which will now be monthly. Video here:

Special thanks to @treff7es from @prezi for talking through process of moving their entire system to Dagster
2/ @gasnerpants discussed recap of our latest major release docs.google.com/presentation/d…
3/ @s_ryz discussed our 0.10.0. Versioning our computations and produced assets allowing fully incremental compute during development and backfilling will be a total gamechanger. Lots of other goodies here. docs.google.com/presentation/d…
Read 5 tweets
Aug 11, 2020
1/ Dagster has been public over a year. Last week we pushed out a new version that marks a new level of maturity for the project. We now call Dagster a data orchestrator. Here is a post about what we’ve built, learned, and principles we've developed:

medium.com/dagster-io/dag…
2/ Over the past decade, there have been huge advances in data technology. Advanced computational runtimes and cloud data warehouses built on infinite, cheap storage and elastic compute are available to any organization with the right tools and sufficient resources.
3/ We believe the primary challenges today are higher in the stack and are abstraction and tooling problems. Data computations are hard to test, slow to build, disorganized, and under-abstracted. Uncontrolled complexity is the norm. Their needs are not met by the ecosystem today.
Read 25 tweets
Jul 16, 2020
1/ elementl.com/2020-2021-engi…

This year students are facing the prospect of returning to a degraded college experience while still paying full tuition. We at Elementl suggest an alternative: a well-paid, year-long fellowship where you work on an open source project, Dagster.
2/ The model already works really well. Just look at Waterloo, one of the best engineering schools in the world. With this you can make your own Waterloo experience. You can work from anywhere and would be treated as a full-time employee with commensurate pay and benefits.
3/ We're not encouraging people to drop out. We think of this like gap year of applied learning.
Read 9 tweets
Feb 27, 2020
Happy to announced that the @dagsterio team has pushed out our latest major rev, 0.7.0. In the last six months we've moved from a tool suitable for local development, to a hostable one for smaller pipelines, to one for large scale pipelines in modern infra.
First: a reskin and new navigation scheme for our frontend, Dagit. We think that Dagit sets a new standard for frontend in data tools. We also have dramatically improved rendering perf for large pipelines (1000s of nodes) along with the ability to subselect with a condense syntax
We also have a gorgeous new execution viewer for local development and production ops. It allows you a quickly navigate to running and failed tasks, filter down to upstream and downstream tasks only, and view our structured logs. It is also just really fun.
Read 13 tweets
Nov 7, 2019
California is under siege by horrific fires. Both my wife @lesliejz and I have felt helpless witnessing heroic firefighters fight this battle. They deserve our support. We're starting a @gofundme to raise critical funds and will be matching donations. gf.me/u/wdzjam
Sonoma County continues to be ground zero for many of these catastrophic fires, such as the 2017 Tubbs Fire and 2019 Kincaid Fire. And the firefighters in those very areas are critically underfunded.
The Sonoma County Fire District serves an area near Calistoga/Windsor and is in financial dire straits. They are funded by property taxes. This property is literally being burned away. In the 2017 Tubbs fire alone, **25%** of the property tax base in the area was destroyed
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(