Post

https://x.com/reach_vb/status/1772757101804167323?s=20

https://twitter.com/reach_vb/status/1772749776699670772

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @reach_vb

Vaibhav (VB) Srivastav

@reach_vb

Mar 21

Introducing Distil-Whisper v3 ⚡

> ~50% less parameters and 6x faster than Large-v3.
> More accurate than large-v3 on long-form synthesis.

Available with 🦀 WebGPU, Whisper.cpp, Transformers, Faster-Whisper and Transformers.js support!

Drop in; no changes are required! 🔥

Along with this, we announce an alpha release of Ratchet - our optimised WebGPU framework to serve blazingly fast Whisper:

Written in 🦀 Rust!

huggingface.co/spaces/FL33TW0…

You can find all the weights and their corresponding usage below.

Can't wait to see what the community builds with it! 🤗

huggingface.co/collections/di…

Read 5 tweets

Vaibhav (VB) Srivastav

@reach_vb

Mar 19

Introducing Quanto: A PyTorch Quantisation library! ⚡

a.k.a. the gpu poor toolkit ;)

> Supports, int - 2, 4, 8 weights.
> Works seamlessly on CUDA, MPS and CPU.
> Automagically operates with all PyTorch models.
> Native support for Transformers. 🤗
> Quantize, Calibrate or perform Quantization Aware Training!

Best part: Minimal loss in accuracy/ perplexity even with int-4 quantisation.

Optimised matmul kernels for int-2,4,8 coming soon!

> pip install quanto

github.com/huggingface/qu…

Read this brilliant blog post put together by the team ❤️

huggingface.co/blog/quanto-in…

Read 4 tweets

Vaibhav (VB) Srivastav

@reach_vb

Mar 12

Introducing FACodec! ⚡

> Factorised Neural Speech Codec.
> Powers NaturalSpeech 3.
> Checkpoints and Codebase - Apache 2.0 Licensed.
> Performs zero-shot Voice Conversion.
> Consists of an explicit Timbre Extractor & Prosody, Content and Acoustic detail quantisers.
> Current SoTA in Codec.
> Checkpoints on Hugging Face Hub. 🤗

Check out the checkpoints here:

huggingface.co/amphion/natura…

and.. a space to play around and observe its reconstruction quality:

huggingface.co/spaces/amphion…

Read 4 tweets

Vaibhav (VB) Srivastav

@reach_vb

Mar 11

Wow! @CohereForAI just released CMD-R 🔥

> Beats GPT 3.5
> 128K context window.
> 35 billion parameters.
> 10 languages.
> Optimised for reasoning, question answering and summarisation.
> Use it directly in transformers 🤗

huggingface.co/CohereForAI/c4…

All you need to make it work with transformers! ⚡

https://x.com/reach_vb/status/1767285816483532946?s=20

https://x.com/reach_vb/status/1767285816483532946?s=20

Read 8 tweets

Vaibhav (VB) Srivastav

@reach_vb

Mar 9

Fast Mamba Inference is now in Transformers! 🐍

All you need is 5 lines of code and the latest transformers!

Bonus: You can also fine-tune/ RLHF it with TRL & PEFT too 🤗

We support all the base checkpoints along with community-tuned checkpoints too.

Want to try it, too? :)

import torch
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer

device = "cuda:1"

tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
model = MambaForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf",
device_map=device)

input_ids = tokenizer("The meaning to life is ", return_tensors="pt")["input_ids"]

out = model.generate(input_ids.to(device),
max_new_tokens=100)

print(tokenizer.batch_decode(out))

That's it! 🤗

Massive kudos to @art_zucker for adding it to transformers. Check out the Documentation for more details:

huggingface.co/docs/transform…

@art_zucker PEFT tuning is literally as simple as this 🔥

Read 4 tweets

Vaibhav (VB) Srivastav

@reach_vb

Feb 6

Let's go! MetaVoice 1B 🔉

> 1.2B parameter model.
> Trained on 100K hours of data.
> Supports zero-shot voice cloning.
> Short & long-form synthesis.
> Emotional speech.
> Best part: Apache 2.0 licensed. 🔥

Powered by a simple yet robust architecture:
> Encodec (Multi-Band Diffusion) and GPT + Encoder Transformer LM.
> DeepFilterNet to clear up MBD artefacts.

Synthesised: "Have you heard about this new TTS model called MetaVoice."