Sub 600ms latency speech conversational AI is completely possible today, surprised I haven’t seen anyone that does this.
The key is hosting a model (like llama), streaming from whisper, and every few tokens, prefilling more of the kv cache - without evicting from memory (1/4)
Whisper could transcribe the remainder of the text after speaking in 100ms.
This means the time to first voice response is around the time to process the last several tokens of input, then generate the first 20 tokens, then pass that into a text to speech model. (2/4))
With quantized llama 70b using 2-way tensor parallelism, you could process + generate those tokens in 0.4s on two A100s (3/4)
The last step is getting a text to speech model to start talking within 100ms.
Again, this feels quite achievable if hosting a model and streaming the text into the models kv cache.
But I’m less confident on the state of text to speech oss models (4/4)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The size of all code/history on Github public repos is 92TB
The size of Google's monorepo in 2015 was 86TB (of much higher quality code)
If Google were willing to deploy code models trained on their own data, they'd have a noticable advantage over everyone else. twitter.com/i/web/status/1…
To be fair, they would have to use the diff/history data much more than the raw code.
There is almost certainly more code at Github HEAD, but Google may benefit from a richer commit history
And, I'd suspect the size of their mono repo would have substantially increased since
Furthermore, these high-quality code tokens aren't just good for code, but incredibly useful for general language modeling performance:
There are times and places for training your own models... With the release OpenAI's chatGPT API - coding is looking less like one of them.
The human-eval pass@1 rate of ChatGPT is as good as the best Open Source model's pass@100 rate.
And this is still just GPT 3.5...
Not only that, but 10x better pricing than text-davinci, and far lower latency.
After seeing this news today, I really would not want to be one of OpenAI's competitors
For those unfamiliar with pass@k, this means if I took the best open-source code model (CodeGen 16B) and sampled 100 generations, the probability that 1 of those 100 generations was correct is the same as the probability ChatGPT gets it right on the first try.