Vaibhav (VB) Srivastav Profile picture
Jan 30 4 tweets 2 min read Read on X
Whisper powered by Apple Neural Engine! 🔥

The lads at @argmaxinc optimised Whisper to work at blazingly fast speeds on iOS and Mac!

> All code is MIT-licensed.
> Upto 3x faster than the competition.
> Neural Engine as well as Metal runners.
> Open source CoreML models.
> 2 lines of code :)
> Whisper & Whisper-Turbo (even faster variant)

(Look how it utilises ANE so beautifully in the video showing their sample app on Mac!)
@argmaxinc Open Source Swift package for iOS and Mac devices 🍎

github.com/argmaxinc/Whis…
@argmaxinc The Whisper-Turbo variants bring the memory requirements down by up-to 1/3rd ⚡

*With quite a little drop in performance. Image
All their CoreML models are on the Hub, open source! ❤️

huggingface.co/argmaxinc/whis…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vaibhav (VB) Srivastav

Vaibhav (VB) Srivastav Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @reach_vb

Jan 27
Whisper in transformers is now better at Long-form generation! ⚡

We've observed an up-to 2-point decrease in Word Error Rate! ;)

You can now use the same techniques used by Open AI Whisper but much faster, thanks to Flash Attention 2 and batching! 🔥

With batching, we've observed up to 4.5x improvements compared to the original implementation!

Make sure to upgrade to the latest version of Transformers - `pip install -U transformers`Image
Here's how you can test it too:

#!/usr/bin/env python3
from transformers import WhisperForConditionalGeneration, AutoProcessor
from datasets import load_dataset, Audio
import torch
import numpy as np

processor = AutoProcessor.from_pretrained("openai/whisper-small.en")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small.en", torch_dtype=torch.float16)
model. to("cuda")

# retrieve 8 long audio sequences
ds = load_dataset("distil-whisper/earnings21", "full")["test"]
ds = ds.cast_column("audio", Audio(sampling_rate=16000))
ds = ds[:8] # take batch size of 8

raw_audio = [x["array"].astype(np.float32) for x in ds["audio"]]

# process input, make sure to pass `padding='longest'` and `return_attention_mask=True`
inputs = processor(raw_audio,
return_tensors="pt",
truncation=False,
padding="longest",
return_attention_mask=True,
sampling_rate=16_000)

inputs = inputs. to("cuda", torch.float16)

# activate `temperature_fallback` and repetition detection filters and condition on prev text
result = model.generate(**inputs,
condition_on_prev_tokens=False,
temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
logprob_threshold=-1.0,
compression_ratio_threshold=1.35,
return_timestamps=True)

decoded = processor.batch_decode(result, skip_special_tokens=True)
print(decoded)Image
All thanks to Patrick for helping add this so brilliantly!

Check out the PR here:
github.com/huggingface/tr…
Read 4 tweets
Jan 20
Introducing DataTrove 🤯

Its processing pipelines are platform-agnostic, running out of the box locally or on a slurm cluster.

Low memory usage and multiple step design makes it ideal for large workloads, such as to process an LLM's training data. ✨

github.com/huggingface/da…
We provide a wide array of quick stadt examples to get you started! 🚀 Image
Full Pipeline consists of a DataTrove document: Image
Read 6 tweets
Jan 14
What are the top open source TTS models out there? 🤔

Here’s my list so far:

XTTS -
YourTTS -
FastSpeech2 -
VITS -
TorToiSe -
Pheme -

Edit:

Some more options from the comments 👇🏻

EmotiVoice -
StyleTTS 2 -
pflowtts_pytorch -
VALL-E -

What else is out there?huggingface.co/coqui/XTTS-v2
github.com/Edresson/YourT…
github.com/DigitalPhoneti…
huggingface.co/docs/transform…
github.com/neonbjb/tortoi…
github.com/PolyAI-LDN/phe…
github.com/netease-youdao…
github.com/yl4579/StyleTT…
github.com/p0p4k/pflowtts…
github.com/enhuiz/vall-e
Ah I somehow managed to fork the tweet with my edits lol.

In case you know of any other models then put them down below please! 🙏
Read 6 tweets
Jan 13
Introducing MLX-LM! ⚡ *sound on*

Run LLMs on-device directly on your Mac with 3 lines of code! ;)

100% local and quite spiffy (even faster with 4-bit)!

I made a quick video covering the package, its capabilities and a bit of quantisation.

The video goes through what MLX is, and some applications and then we explore the mlx-lm package.

All you gotta do is:

`pip install mlx-lm` 🔥
Another stellar job by @awnihannun in making this land so beautifully! - and there's more in store, I'm sure ;)

I uploaded the video to YT in case y'all face any issues watching this on X:
Here's all you need to do to get started:

Step 1: Create a virtual environment and install mlx-lm

python3 -m venv mlx-experiments

Next, activate the virtualenv

source mlx-experiments/bin/activate

Lastly, install mlx-lm

pip install mlx-lm
Read 6 tweets
Jan 8
Let's go, 200% faster Whisper w/ speculative decoding! 🔥

Whisper (baseline) - 73 seconds
Whisper w/ Speculative Decoding - 33 seconds

All with zero drop in performance! ⚡

Pseudocode:
1. Initialise a Teacher model ex: openai/whisper-large-v2.
2. Load an assistant model ex: distil-whisper/distil-large-v2 or openai/whisper-tiny.
3. Pass the assistant model over to the pipeline.
4. Transcribe away!

That's it! 🤗
Step 1: Initialise a teacher model.Image
Step 2: Load an assistant model.Image
Read 6 tweets
Nov 30, 2023
Making audio a first-class citizen in LLMs: Qwen Audio 🔉

Using a Multi-Task Training Framework, Qwen Audio - Combines OpenAI's Whisper large v2 (Audio encoder) with Qwen 7B LM to train on over 30 audio tasks jointly.

Tasks ranging from Speech Recognition to Music Captioning to Language Identification to Sound Event Classification and more! 🔥

It beats the current SoTA across the tasks!

Bonus: Instruction-tuned Qwen-Audio-Chat allows for seamless multi-turn interactions through audio or text inputs.

Let the era of Audio-LLMs begin! 🤯
Image
Play with the model directly here 🤗
huggingface.co/spaces/Qwen/Qw…
Base model 👇🏻

huggingface.co/Qwen/Qwen-Audio
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(