Vaibhav (VB) Srivastav Profile picture
Sep 11 3 tweets 1 min read Read on X
🚨 New powerful open Text to Speech model: Fish Speech 1.4 - trained on 700K hours of speech, multilingual (8 languages)🔥

> Instant Voice Cloning
> Ultra low latency
> ~1GB model weights

> Model weights on the Hub 🤗
> Play with the space in the comments

Kudos to @FishAudio team! They also release a pretty cheap API too 🐐
Model weights here:

huggingface.co/fishaudio/fish…
Play with the space here:

huggingface.co/spaces/fishaud…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Vaibhav (VB) Srivastav

Vaibhav (VB) Srivastav Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @reach_vb

Jul 29
Apple spilled the beans on Apple Intelligence Foundation Models (notes below):

Architecture:
> Dense - decoder only transformer architecture
> RMSNorm & Query/ Key normalization
> GQA (w/ 8 KV heads)
> SwiGLU activation & RoPE (base_freq=500K for long context)

Pre-training & Tokenisation:
> Webpages crawled through the Applebot (web crawl)
> Code & Math datasets (publicaly licensed)
> BPE tokenizer w/ 100K vocab for server & 49K for on-device

Three step pre-training:
>Core (consumes most of the compute budget)

AFM-server - 6.3T tokens + 4096 seq length
AFM-on-device - initialised from a pruned 6.4B server model, trained for full 6.3T tokens along with distillation loss

- Continued (down-weight lower quality data and increase code, math, licensed data weight)
1T tokens, w/ 8192 seq length
no distillation loss for AFM-on-device in this phase

- Context-lengthening with long sequence + synthetic data
100B tokens, w/ 32768 seq length

Training Infrastructure:
> Pre-trained v4 & v5p TPU clusters
> Using AXLearn (JAX) with a combination of tensor, fsdp, and seq parallelism
> AFM Server trained on 8192 TPUv4 chips
> AFM On-device trained on 2048 TPUv5p chips

Post Training:
> Hybrid data - synthetic + human annotated
> Synthetic data for Mathematics (problem rephrase & reversion + evolution), Tool use and coding
> RLHF: Iterative Teaching Committee - Refresh online human preference data collection using a diverse set of best performing model
> For above, collect pairwise human preference on responses sampled from the comittee

Deployment:
> Adapters for each task, adapter values represented using 16-bits, loaded on-the-fly based on the task
> Quantised under 4-bit-per-weight (3.7 bpw), use accuracy recovering adapters for regaining the lost performance
> Accuracy recovery adapter trains on 10B tokens across different ranks, 8, 16, 32
> Some layers (unimportant) pushed to 2-bit

Evaluation:
> On-device: SoTA in IFEval and competitive with Gemma 7B on AlpacaEval 2.0
> Server: SoTA in IFEval, comparable to Mixtral 8x22B in Arena Hard
> Competitve with GPT 4/ Gemini 1.5 on Tools/ function calling, writing (summarisation, composition) benchmarks
> On-device beats L3 8B on Math

The report is quite feature packed, quite enjoyed skimming through it. Thanks Apple for being so open about your practices and spilling the beans on what would power the next gen of on-device ML.

More notes, coming soon! 🤗Image
Maxime (@maximelabonne) did a wonderful deep-dive on the Post-training bit, check it out!

Read 5 tweets
Jul 15
AI Math Olympiad Winner - Running on Mac! 100% local 🔥

brew install llama.cpp

llama-cli
--hf-repo reach-vb/NuminaMath-7B-TIR-Q8_0-GGUF
--hf-file numinamath-7b-tir-q8_0.gguf
-p "For how many values of the constant $ k $ will the polynomial $ x^{2}+kx+36$ have two distinct integer roots?"

That's it! 🤗
Check out the Quantised Q8 checkpoint here:

huggingface.co/reach-vb/Numin…
Quantised via GGUF-my-repo

huggingface.co/spaces/ggml-or…
Read 4 tweets
Jul 4
TTS ecosystem has been booming lately:

1. Chat TTS - English + Chinese TTS model optimised for daily conversations/ dialogues + Voice Cloning

2. MARS5 TTS - English only but gives insane prosodic control paired with voice cloning

3. Parler TTS - Smol but powerful text prompt controlled TTS (we’re scaling it up right now)

4. Toucan - Massively Multilingual TTS in 4000+ languages (works even on CPU)

5. MetaVoice - 1B param model with deep voice cloning control. English only.

We’re only half way through the year, pumped to see what the rest has in store for us!

What else am I missing from this year?
Read 7 tweets
Jun 4
Upto 6x faster Whisper with torch compile and HQQ! 🔥

> With negligible drop in performance!

Code + benchmark released ⚡
Benchmarks on short-form datasets indicate very little drop in performance:

*look at that speed-up! Image
Long form benchmarks are even better! Image
Read 5 tweets
Apr 11
Kinda wild that you can merge models with SoTA techniques at the click of a button! 🤯

Presenting MergeKit UI - Drop in your config, access token and voila, you get a merged model back!

Supported merging methods:
1. Model Soups
2. SLERP
3. Task Arithmetic
4. TIES
5. DARE TIES
6. DARE TIES Arithmetic
7. Passthrough
8. Model Stock

We'll take care of the compute so you can work on what matters the most! ✨

Bring it on; let's merge our way to the current SoTA and beyond! 🤗

What would you like to see next? ⚡
Check out the space here:

huggingface.co/spaces/arcee-a…
Supported methods: Image
Read 4 tweets
Apr 9
CodeGemma 2B, 7B & 7B-it 💎

> pretty strong model, beats codellama 13B.
> supports fill-in-the-middle (code completion), code generation and chat.
> compatible with torch.compile()
> optimised for speed, about ~1.5x faster than models in the similar category.
> 2B model supports FIM only.
> 7B supports FIM + Code Generation.
> 7B-IT supports Code Generation + Chat.

> try it out in transformers directly! 🤗Image
Check out all the models here:

huggingface.co/collections/go…
Try it out in transformers! 🤗

Colab: github.com/Vaibhavs10/not…
Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(