🛠️Tooling Tuesday🛠️

Today, we share a @GoogleColab notebook implementing a Transformer with @PyTorch, trained using @PyTorchLightnin.

We show both encoder and decoder, train with teacher forcing, and implement greedy decoding for inference.

colab.research.google.com/drive/1swXWW5s…

👇1/N
2/N Transformers are a game changer.

This architecture has superseded RNNs for NLP tasks, and is likely to do the same to CNNs for vision tasks.

PyTorch provides Transformer modules since 1.2, but the docs are lacking:

- No explanation of inference
- Tutorial is encoder-only
3/N Our notebook shows both. Let's get started with simple data.

Our output will be number sequences like [2, 5, 3].

Our input will be the same as output, but with each element repeated twice, e.g. [2, 2, 5, 5, 3, 3]

We start each sequence with 0 and end each sequence with 1.
4/N We do the simplest possible thing to wrap this data with a PyTorch DataLoader, which will handle batching, shuffling, and pre-fetching.
5/N We now define our Transformer, making use of built-in PyTorch modules
6/N The forward() method encodes the input, and then decodes the input and the output together, where the output is partially masked to prevent "peeking" forward.
7/N With some PyTorch-Lightning boilerplate, we're ready to train on any number of GPUs/TPUs.

Note the "teacher-forcing", where the ground truth is fed into the model shifted by one character.

Training on this toy data finishes quickly with 100% validation accuracy.
8/N To calculate accuracy, we need to implement greedy decoding.

This is where the input is used to generate output tokens one at a time. In our case, we use greedy selection, but beam search can be used instead for a potential accuracy boost.
9/N And that's all there is to it!

Hope the notebook is useful.

If you want more, check out official docs, a helpful post from ScaleAI, and a great explanation of the Transformer architecture:

- pytorch.org/docs/stable/ge…
- pgresia.medium.com/making-pytorch…
- peterbloem.nl/blog/transform…
10/N Lastly, our Berkeley course is beginning next Tuesday! Remember to sign up to receive updates as we release lectures (we will do so with a delay): forms.gle/235LpvXmeCN21j…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Full Stack Deep Learning

Full Stack Deep Learning Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @full_stack_dl

5 Jan
🛠️Tooling Tuesday🛠️

Let's talk about setting up our Python/CUDA environment!

Our goals:

- Easily specify exact Python and CUDA versions
- Humans should not be responsible for finding mutually-compatible package versions
- Production and dev requirements should be separate

1/N
Here's a good way to achieve these goals:

- Use `conda` to install Python/CUDA as specified in `environment.yml`

- Use `pip-tools` to lock in mutually compatbile versions from `requirements/prod.in` and `requirements/dev.in`

- Simply run `make` to update everything!

2/N
Here's our `environment.yml` file.

It specifies Python 3.8, CUDA 10.2, CUDNN 7.6.

To create an environment from this, install Miniconda (docs.conda.io/en/latest/mini…) and run `conda env create`.

Activate the environment with `conda activate conda-piptools-sample-project`

3/N
Read 7 tweets
29 Dec 20
🛠️Tooling Tuesdays: Thread of Threads🛠️

Every week, we share a useful tool for full stack machine learning. Follow along, and please share your suggestions!

1/N
Read 5 tweets
23 Dec 20
🛠️ Tooling Tuesday 🛠️

This week: @dagsterio (dagster.io)

dagster describes themselves as a "data orchestrator for machine learning, analytics, and ETL"

Let's break that down 👇
2/ When you work with real-world data, your pipelines can get complex.

E.g., to train a language model on twitter, you might:
- Download data
- Strip out offensive tweets
- Preprocess the data
- Fit models
- Summarize training performance
- Deploy the best model to production
3/ In production settings, pipelines can be even more complicated.

All well and good, but doing those steps manually every time you update your model is painful, resource intensive, and hard to scale.

And what happens if you have hundreds of these pipelines you need to manage?
Read 13 tweets
11 Dec 20
1/ @lishali88 and @spring_stream joined us to talk about building Rosebud.ai.

Rosebud.ai's @tokkingheads turns portraits into animated avatars that read text you provide. It's fun to play around with!

Here are some challenges they faced building it:
2/ A scalable model training platform was key to experimenting quickly enough to build talkingheads.rosebud.ai.

They built theirs on Kubernetes and take advantage of spot instances to keep costs down.

More on their training infra here: blog.rosebud.ai/cost-efficient…
3/ Model quality is key to their product, so Rosebud prioritizes that over performance.

They're looking into model compression techniques to make big models faster (and more cost effective).
Read 9 tweets
9 Dec 20
🛠️FSDL Tooling Tuesday🛠️

@DVCorg is one of the fastest growing ML experiment management tools.

The main idea of DVC is to *track ML experiments in git*

Everything is versioned -- the code, the data, the model, and the metrics created by your experiment. Pretty powerful!
The magic of DVC is that it supports datasets and models too large to store in github.

And since every part of your experiment is versioned, you can easily roll back to an earlier run and reproduce it.

No more fiddling around to recreate that experiment from two weeks ago!
What are the tradeoffs? (1/2)

*DVC does a lot*

Versioning data, experiment tracking, and running pipelines. You might prefer lighter weight tools (e.g., replicate.ai) for any one of these
Read 6 tweets
1 Dec 20
🛠FSDL Tooling Tuesday🛠

@DeepnoteHQ is an epic Jupyter notebook alternative:

- Improved UX
- Real-time collaboration (editing and discussion)
- Direct connections to your data stores, including Postgres, S3, and BigQuery
- Effortless sharing of your running notebook

👇 Image
One major con: Deepnote does not yet support GPU compute.

For data scientists who don't need to train deep learning models, Deepnote is a great tool to check out. It improves your developer experience and allows effortless sharing of your work with your teammates and manager.
While the Deepnote team is working on adding GPU support, there's another Jupyter-like cloud notebook you can use for deep learning: @GoogleColab.

If you use it, we recommend signing up for their $10/month Pro plan for priority access to TPUs, longer runtimes, and more RAM.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!