🛠️ Tooling Tuesday 🛠️

This week, a new entrant to the experiment tracking / reproducibility space: keepsake.ai, by @replicateai

Keepsake calls itself "Version control for machine learning"
Like other experiment tracking tools, keepsake aims to be super easy to integrate with how you train models, and to require few code changes to get started
What's different?

Keepsake is open source, and all of the metadata and artifacts from your experiments are stored as tarballs and json files in your AWS or GCP account.

That means there's no cloud service to sign up for, and no additional servers or infrastructure to manage.
Keepsake aims to make experiments reproducible, not just track them, so it includes a CLI and visualization library that can:
* Check out code and weights from a previous experiment
* Compare experiments
* Sort and filter runs
* Visualize training runs in a notebook
Who is this for?

If you're already a heavy user of @weights_biases, @MLflow, @DVCorg, or another experiment management / reproducibility platform, keepsake probably isn't full featured enough to convince you to switch yet.
But if you haven't set one of these up for your project because you don't want to use a SaaS platform and don't want the complexity of @MLFlow, then give it a shot.

If you value simple tools with nice UX that do a single job well, you'll probably enjoy this library.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with Full Stack Deep Learning

Full Stack Deep Learning Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @full_stack_dl

16 Feb
1/🛠Tooling Tuesday🛠

Let's talk about an open source absolute unit: 🤗

@huggingface provides open-source implementations of most Transformer architectures, in both PyTorch and TF.

Better yet, they have many pre-trained NLP models and datasets.

And recently, an inference API.
2/ The 🤗 transformers library implements many architectures: BERT, GPT, XLNet, etc.

It works for both PyTorch and Tensorflow, and provides three main modules, all nicely interoperable:

- Data configuration and tokenization
- Model architecture
- Training
3/ All of the transformers library code is compatible with pre-trained model weights available on Hugging Face Hub.

This enables rapid progress: no matter what your NLP task/language is, there is probably a pre-trained model for you to start fine-tuning with.
Read 5 tweets
9 Feb
🛠️ Tooling Tuesday 🛠️

This week: @ApacheTVM (tvm.apache.org)

TVM describes themselves as a an "end to end machine learning compiler framework for cpus, gpus, and accelerators".

Let's talk about what that means and why it might be useful for you:
Under the hood, deep learning frameworks are built on "kernel operator" libraries like cuDNN. These are the primitives that help run your model fast on a GPU or other accelerator.

The problem is: these aren’t open source (not extensible) and only work on specific platforms.
So what happens if you the platform you want to deploy to isn't supported or performance is bad?

That's where TVM comes in. It has importers for all the major frameworks and tutorials to compile optimized versions of your models for common CPUs and GPUs.
Read 8 tweets
26 Jan
🛠️Tooling Tuesday🛠️

Good training data is like half of the task when it comes to machine learning.

There are many great tools and services for annotating your data. Today, we'll talk about labelstud.io

2/6 Many kinds of data can be annotated with Label Studio: images, audio, text, time series, and any combination of the above.

You can easily define your own annotation interface with a simple HTML-like markup.

Check out many pre-made templates at labelstud.io/playground/
3/6 Label Studio is open-source and is under active development.

It has 4K stars on Github, the latest major release is just a couple of weeks ago, and pull requests are successfuly getting merged :)

You can try it yourself locally with a single Docker command.
Read 6 tweets
20 Jan
🛠️ tooling tuesday 🛠️

In honor of our first lecture at Berkeley this evening, here's our remote teaching stack:
1/ @zoom_us. Duh.

One thing that makes it better is to have a good audio/video setup. Here's a good guide:

Fujifilm cameras work too and avoid the need for the Camlink.
2/ @SlackHQ for question management.

Zoom chat is unthreaded and hard to react to. Asking questions live is chaos. Instead, students post questions in a slack channel.

Instructors can answer them directly in slack, or summarize and answer aloud at a break in the lecture.
Read 7 tweets
13 Jan
🛠️Tooling Tuesday🛠️

Today, we share a @GoogleColab notebook implementing a Transformer with @PyTorch, trained using @PyTorchLightnin.

We show both encoder and decoder, train with teacher forcing, and implement greedy decoding for inference.


2/N Transformers are a game changer.

This architecture has superseded RNNs for NLP tasks, and is likely to do the same to CNNs for vision tasks.

PyTorch provides Transformer modules since 1.2, but the docs are lacking:

- No explanation of inference
- Tutorial is encoder-only
3/N Our notebook shows both. Let's get started with simple data.

Our output will be number sequences like [2, 5, 3].

Our input will be the same as output, but with each element repeated twice, e.g. [2, 2, 5, 5, 3, 3]

We start each sequence with 0 and end each sequence with 1.
Read 10 tweets
5 Jan
🛠️Tooling Tuesday🛠️

Let's talk about setting up our Python/CUDA environment!

Our goals:

- Easily specify exact Python and CUDA versions
- Humans should not be responsible for finding mutually-compatible package versions
- Production and dev requirements should be separate

Here's a good way to achieve these goals:

- Use `conda` to install Python/CUDA as specified in `environment.yml`

- Use `pip-tools` to lock in mutually compatbile versions from `requirements/prod.in` and `requirements/dev.in`

- Simply run `make` to update everything!

Here's our `environment.yml` file.

It specifies Python 3.8, CUDA 10.2, CUDNN 7.6.

To create an environment from this, install Miniconda (docs.conda.io/en/latest/mini…) and run `conda env create`.

Activate the environment with `conda activate conda-piptools-sample-project`

Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!