TVM describes themselves as a an "end to end machine learning compiler framework for cpus, gpus, and accelerators".
Let's talk about what that means and why it might be useful for you:
Under the hood, deep learning frameworks are built on "kernel operator" libraries like cuDNN. These are the primitives that help run your model fast on a GPU or other accelerator.
The problem is: these aren’t open source (not extensible) and only work on specific platforms.
So what happens if you the platform you want to deploy to isn't supported or performance is bad?
That's where TVM comes in. It has importers for all the major frameworks and tutorials to compile optimized versions of your models for common CPUs and GPUs.
A few cool examples: the @Octoml team used TVM to get better BERT performance on M1 Macs than CoreML from @apple
The main drawback is that since TVM is a flexible compiler toolkit, so there are a lot of knobs to tune. And if you only care about a single platform, it may or may not be the way to get the highest performance for that platform.
What else is out there?
* MLIR from @clattner_llvm and Google. It's a promising alternative, but still being built out and may not be suited for all models
* TensorRT. If you're using tensorflow and Nvidia GPUs, this is the standard, but other hardware support is limited.
Anyone have experience using @ApacheTVM? What else should people consider?
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Let's talk about setting up our Python/CUDA environment!
Our goals:
- Easily specify exact Python and CUDA versions
- Humans should not be responsible for finding mutually-compatible package versions
- Production and dev requirements should be separate
1/N
Here's a good way to achieve these goals:
- Use `conda` to install Python/CUDA as specified in `environment.yml`
- Use `pip-tools` to lock in mutually compatbile versions from `requirements/prod.in` and `requirements/dev.in`
dagster describes themselves as a "data orchestrator for machine learning, analytics, and ETL"
Let's break that down 👇
2/ When you work with real-world data, your pipelines can get complex.
E.g., to train a language model on twitter, you might:
- Download data
- Strip out offensive tweets
- Preprocess the data
- Fit models
- Summarize training performance
- Deploy the best model to production
3/ In production settings, pipelines can be even more complicated.
All well and good, but doing those steps manually every time you update your model is painful, resource intensive, and hard to scale.
And what happens if you have hundreds of these pipelines you need to manage?