My Authors
Read all threads
I just finished giving this talk. Here's the tweet version. #SciPy2020 1/
We've been trying to gauge the size of our community lately. The best proxy we have right now is the number of weekly visitors to the @dask_dev documentation. Which currently stands at around 10,000. 2/
Dask also came up in the @jetbrains Python developer survey. We were excited to see 5% of all the Python developers who filled out the survey said they use Dask. Which shows health in the PyData community as well as Dask.… 3/
We are running our own survey at the moment. If you are a Dask user please take a few minutes to fill it out. We would really appreciate it. 4/
In February we had an in-person Dask Summit where a mixture of OSS maintainers and institutional users met. We had talks and workshops to help figure out our challenges and set our direction… 5/
The Dask community also has a monthly meeting! It is held on the first Thursday of the month at 10:00 US Central Time. If you're a Dask user you are welcome to come to hear updates from maintainers and share what you're working on.… 6/
There are many projects built on Dask. Looking at the preliminary results from the 2020 Dask survey shows some that are especially popular. Let's take a look at each of those. 7/
The first is @xarray_dev! Xarray allows you to work on multi-dimensional datasets that have supporting metadata arrays in a Pandas-like way. 8/
Next is @rapidsai. RAPIDS is an open-source suite of GPU accelerated Python libraries. Using these tools you can execute end-to-end data science and analytics pipelines entirely on GPUs. All using familiar PyData APIs. 9/
Up next we have @blazingsql. BlazingSQL builds on RAPIDS and Dask to provide an open-source distributed, GPU accelerated SQL engine. 10/
We also have XGBoost. While XGBoost has been around for a long time you can now prepare your data on your Dask cluster and then bootstrap your XGBoost cluster on top of Dask and hand the distributed dataframes straight over.… 11/
Next is @PrefectIO. Prefect is a workflow manager which is built on top of Dask's scheduling engine. "Users organize Tasks into Flows, and Prefect takes care of the rest." 12/
Lastly is @scitools_iris. Iris uses the CF data model giving you a format-agnostic interface for working with your data. It excels when working with multi-dimensional Earth Science data, where tabular representations become unwieldy and inefficient.… 13/
These are the tools our community have told us they like so far. But if you use something which didn't make the list then head to and let us know! According to PyPI there are many more out there. 14/
There are many user groups who use Dask. Everything from life sciences, geophysical sciences and beamline facilities to finance, retail and logistics. Check out the great "Who uses Dask?" talk from @mrocklin for more info. 15/
There has been an increase in for-profit companies building tools with Dask. Including @CoiledHQ, @PrefectIO and @saturn_cloud. 16/
We've also seen large companies like @Microsoft's @Azure ML team contributing a cluster manager to Dask Cloudprovider. This helps folks get up and running with Dask on AzureML quicker and easier. 17/…
Moving on to recent improvements there has been a lot of work to get @openucx supported as a protocol in Dask. Which allows worker-worker communication to be accelerated vastly with hardware that supports Infiniband or NVLink. 18/
There have also been some recent announcements around @nvidia blowing away the TPCx-BB benchmark by outperforming the current leader by 20x. This is a huge success for all the open-source projects that were involved, including Dask. 19/
We've seen increased adoption of Dask Gateway. Many institutions are using it as a way to provide their staff with on-demand Dask clusters. 20/
The update that got the most 👏 feedback from the #SciPy2020 attendees was the Cluster Map Plot (known to maintainers as the "pew pew pew" plot). This plot shows a high-level overview of your Dask cluster scheduler and workers and the communication between them. 21/
To wrap up with what the @dask_dev is going to be doing next we are going to be continuing to work on high-level graph optimisation. 22/
With feedback from our community we are also going to be focussing on making the Dask scheduler more performant. There are a few things happening including a Rust implementation of the scheduler, dynamic task creation and ongoing benchmarking.… 23/
Lastly I'm excited to share that with funding from @ChanZuckerberg, Dask will be hiring a maintainer who will focus on growing usage in the biological sciences field. If that is of interest to you keep an eye on @dask_dev for more announcements.… 24/
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Jacob Tomlinson

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!