PyTorch Profile picture
19 Oct, 11 tweets, 4 min read
✨ Low Numerical Precision in PyTorch ✨
Most DL models are single-precision floats by default.
Lower numerical precision - while reasonably maintaining accuracy - reduces:

a) model size
b) memory required
c) power consumed

Thread about lower precision DL in PyTorch ->
1/11
Lower precision speeds up :

* compute-bound operations, by reducing load on the hardware

* memory bandwidth-bound operations, by accessing smaller data

In many deep models, memory access dominates power consumption; reducing memory I/O makes models more energy efficient.

2/11
3 lower precision datatypes are typically used in PyTorch:

* FP16 or half-precision (`torch. float16`)

* BF16 (`torch. bfloat16`)

* INT8 (`torch.quint8` and `torch. qint8`) which stores floats in a quantized format

3/11
FP16 is only supported in CUDA, BF16 has support on newer CPUs and TPUs

Calling .half() on your network and tensors explicitly casts them to FP16, but not all ops are safe to run in half-precision.

4/11
A better solution is to use Automatic Mixed Precision to let PyTorch choose the right op-specific precision (FP32 vs FP16 / BF16) for your tensors.

5/11
For torch <= 1.9.1, AMP was limited to CUDA tensors using
`torch.cuda.amp. autocast()`

v1.10 onwards, PyTorch has a generic API `torch. autocast()` that automatically casts

* CUDA tensors to FP16, and
* CPU tensors to BF16.

Docs: pytorch.org/docs/1.10./amp…

6/11
Running Resnet101 on a Tesla T4 GPU shows AMP to be faster than explicit half-casting:

7/11
Don’t wrap your backward pass in `autocast()`!

Ensure you’re only wrapping your forward pass and the loss computation in lower-precision.

The backward ops will run in the same dtype that the corresponding forward op was autocast to.

8/11
Low-precision gradients save network bandwidth in distributed training too.

You can enable gradient compression to FP16 with DistributedDataParallel: pytorch.org/docs/stable/dd…

9/11
For non-BF16 and ARM CPUs, lower precision is currently enabled via quantization.

Quantization converts FP32 to INT8, with a potential 4x reduction in model sizes.

Only the forward pass is quantizable, so you can use this only for inference, not training.

10/11
Learn more about half precision on the PyTorch Developer Podcast episode: pytorch-dev-podcast.simplecast.com/episodes/half-…

torch. autocast: pytorch.org/docs/1.10./amp…
AMP Examples:
pytorch.org/docs/stable/no…
Quantization in PyTorch: pytorch.org/docs/stable/qu…

11/11

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with PyTorch

PyTorch Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @PyTorch

14 Sep
Want to make your inference code in PyTorch run faster? Here’s a quick thread on doing exactly that.

1. Replace torch.no_grad() with the ✨torch.inference_mode()✨ context manager.
2. ⏩ inference_mode() is torch.no_grad() on steroids

While NoGrad excludes operations from being tracked by Autograd, InferenceMode takes that two steps ahead, potentially speeding up your code (YMMV depending on model complexity and hardware)
3. ⏩ InferenceMode reduces overheads by disabling two Autograd mechanisms - version counting and metadata tracking - on all tensors created here ("inference tensors").

Disabled mechanisms mean inference tensors have some restrictions in how they can be used 👇
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(