PyTorch Profile picture
25 Oct, 10 tweets, 4 min read
ICYMI: PyTorch 1.10 was released last Thursday. Here are some highlights of the release.

Stay tuned for tweet threads in the next couple weeks delving deeper into these cool new features!

1/8
CUDA Graphs are now in beta, and allow you to capture (and replay!) static CUDA workloads without needing to relaunch kernels, leading to massive overhead reductions! Our integration allows for seamless interop between CUDA graphs and the rest of your model.

2/9
FX, an easy to use Python platform for writing Python-to-Python transforms of PyTorch programs, is now in stable. FX makes it easy to programmatically do things like fusing convolution w/ batch norm. Stay tuned for some FX examples of cool things that users have built!

3/9
nn.Module parametrization (moving from Beta to Stable) allows you to implement reparametrizations in an user-extensible manner. For example, you can apply spectral normalization or enforce that the parameter is orthogonal! See for an example.

4/9
We have also turned conjugation on complex tensors from an O(N) copy into a constant time operation (just like transpose)! This allows us to fuse conjugation with other PyTorch operators, like matmuls, for as much as 50% increase in speed & 30% increase in memory savings!

5/9
A new LLVM-based JIT compiler is now available for CPUs that can fuse together sequences of PyTorch ops to improve performance. While we’ve had this ability for some time on GPUs, this release brings this capability to CPUs. For certain cases this can bring massive speedups!
6/9
NNAPI (which was moved from prototype to beta) allows PyTorch on Android to leverage specialized hardware such as GPUs or other specialized chips to accelerate neural networks. Since the prototype release, we’ve been hard at work adding more op coverage and other goodies.

7/9
One pain point for using torch.jit.script is that it has often required type annotations in order for compilation to be successful. Now, we’ve enabled profile directed typing for torch.jit.script by leveraging existing tools like MonkeyType, accelerating the process.

8/9
Here is an example benchmark showing the improved PyTorch performance on a HPC benchmark: github.com/dionhaefner/py…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with PyTorch

PyTorch Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @PyTorch

19 Oct
✨ Low Numerical Precision in PyTorch ✨
Most DL models are single-precision floats by default.
Lower numerical precision - while reasonably maintaining accuracy - reduces:

a) model size
b) memory required
c) power consumed

Thread about lower precision DL in PyTorch ->
1/11
Lower precision speeds up :

* compute-bound operations, by reducing load on the hardware

* memory bandwidth-bound operations, by accessing smaller data

In many deep models, memory access dominates power consumption; reducing memory I/O makes models more energy efficient.

2/11
3 lower precision datatypes are typically used in PyTorch:

* FP16 or half-precision (`torch. float16`)

* BF16 (`torch. bfloat16`)

* INT8 (`torch.quint8` and `torch. qint8`) which stores floats in a quantized format

3/11
Read 11 tweets
14 Sep
Want to make your inference code in PyTorch run faster? Here’s a quick thread on doing exactly that.

1. Replace torch.no_grad() with the ✨torch.inference_mode()✨ context manager.
2. ⏩ inference_mode() is torch.no_grad() on steroids

While NoGrad excludes operations from being tracked by Autograd, InferenceMode takes that two steps ahead, potentially speeding up your code (YMMV depending on model complexity and hardware)
3. ⏩ InferenceMode reduces overheads by disabling two Autograd mechanisms - version counting and metadata tracking - on all tensors created here ("inference tensors").

Disabled mechanisms mean inference tensors have some restrictions in how they can be used 👇
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(