@ylecun@boztank And we get force alignment tooling open source as well to help with the quality of longer transcriptions??
This is one HELL of a flex!
MMS was trained on 45K hours of labeled data (15x LESS than whisper) and has twice better WER (word error rate, lower = better) while also supporting 11x more languages?
This is.. how exponential feels folks. Whisper is not eve 1 year old yet! 🤯
"Despite @elonmusk being a jerk on Twitter,
or whatever, I'm happy he exists in the world.
But I wish he would do more to look at the
hard work we're doing to get this stuff right."
I loved this one about @lexfridma and @sama geeking about code editors, where Lex admits he switched to VsCode largely because of coPilot and Sama being all giddy about vscode 😂
.@sama thinks that the fact that you can chat with GPT-4 about your code, is a "really big deal" only after 6 days (before plugin system was released)
@sama .@sama on the pressure from outrage journalism as it comes to AI
@sama "The reason Steve Jobs insisted on the handles on the old macs is to give humans the feeling of control, that they would be able to throw the computer out the window if it misbehaves"
There seem to be no handles like these in these models 😆
.@ilyasut explains everything in simple terms:
Multimodality is important for two reasons: 1) It's useful, as vision enhances the practicality and value of neural networks. 2) It allows us to learn more about the world through images, in addition to text. targum.video/v/2023/3/22/a6…
A third of the human cortex is dedicated to vision. We as human beings learn visually way way before we learn verbally!
There's a whole understanding of the world that lies in vision