We are getting closer to “Her” where conversation is the new interface.
Siri couldn’t do it, so I built an e-mail summarizing feature using #GPT3 and life-like #AI generated voice on iOS.
(🔈Audio on to be 🤯with voice realism!)
How did I do this? 👇
I used the Gmail API to feed in recent unread e-mails into a prompt and send to the @OpenAI#GPT3 Completion API. Calling out details such as not “just reading them out” and other prompt tweaks gave good results
@OpenAI Here are the settings I used, you can see how #GPT3 does a great job of conversationally summarizing. (For the sake of privacy I made up the e-mails shown in the demo)
The audio model was fine-tuned on speech from the movie Her.
I got good results with TorToiSe, but have also experimented with ViTS & YourTTS from @coqui_ai and more recently @ElevenLabs.
None are fast enough for a snappy response together with da-vinci-003 completions, so...
I have a script running on a server at home that checks for e-mails, then prepares and then serves the latest generated audio for me on an endpoint. It's as simple as then setting up a Siri Shortcut in iOS to retrieve and decode the audio
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I used AI to create a (comedic) guided meditation for the New Year!
(audio on, no meditation pose necessary!)
Used ChatGPT for an initial draft, and TorToiSe trained on only 30s of audio of Sam Harris
See 🧵 for implementation details
ChatGPT came up with some creative ideas, but the delivery was still fairly vanilla, so I iterated on it heavily and added a few Sam-isms from my experience with the @wakingup app (Jokes aside - highly recommended)
@wakingup Diffusion models & autoregressive transformers are coming for audio!