I “jailbroke” a Google Nest Mini so that you can run your own LLM’s, agents and voice models.
Here’s a demo using it to manage all my messages (with help from @onbeeper)
🔊 on, and wait for surprise guest!
I thought hard about how to best tackle this and why, see 🧵
After looking into jailbreaking options, I opted to completely replace the PCB.
This let’s you use a cheap ($2) but powerful & developer friendly WiFi chip with a highly capable audio framework.
This allows a paradigm of multiple cheap edge devices for audio & voice detection…
& offloading large models to a more powerful local device (whether your M2 Mac, PC server w/ GPU or even "tinybox"!)
In most cases this device is already trusted with your credentials and data so you don’t have to hand these off to some cloud & data need never leave your home
The custom PCB uses @EspressifSystem's ESP32-S3
I went through 2 revisions from a module to a SoC package with extra flash, simplifying to single-sided SMT (< $10 BOM)
All features such as LED’s, capacitive touch, mute switch are working, & even programmable from Arduino (/IDF)
For this demo I used a custom “Maubot” with my @onbeeper credentials (a messaging app which securely bridges your messaging clients using the Matrix protocol & e2e encryption) which runs locally serving an API
I’m then using GPT3.5 (for speed) with function calling to query this
Fro the prompt I added details such as family & friends, current date, notification preferences & a list additional character voices that GPT can respond in.
The response is then parsed and sent to @elevenlabsio
I've been experimenting with multiple of these, announcing important messages as they come in, morning briefings, noting down ideas and memos, and browsing agents.
I couldn’t resist - here's a playful (unscripted!) video of two talking to each other prompted to be AI’s from "Her
I’m working on open sourcing the PCB design, build instructions, firmware, bot & server code - expect something in the next week or so.
If you don't want to source Nest Mini's (or shells from AliExpress) it's still a great dev platform for developing an assistant!
Stay tuned!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
I wanted to imagine how we’d better use #stablediffusion for video content / AR.
A major obstacle, why most videos are so flickery, is lack of temporal & viewing angle consistency, so I experimented with an approach to fix this
See 🧵 for process & examples
Ideally you want to learn a single representation of an object across time or different viewing directions to perform a *single* #img2img generation on.
This learns an "atlas" to represent an object and its background across the video.
Regularization losses during training help preserve the original shape, with a result that resembles a usable slightly "unwrapped" version of the object
We are getting closer to “Her” where conversation is the new interface.
Siri couldn’t do it, so I built an e-mail summarizing feature using #GPT3 and life-like #AI generated voice on iOS.
(🔈Audio on to be 🤯with voice realism!)
How did I do this? 👇
I used the Gmail API to feed in recent unread e-mails into a prompt and send to the @OpenAI#GPT3 Completion API. Calling out details such as not “just reading them out” and other prompt tweaks gave good results
@OpenAI Here are the settings I used, you can see how #GPT3 does a great job of conversationally summarizing. (For the sake of privacy I made up the e-mails shown in the demo)
I used AI to create a (comedic) guided meditation for the New Year!
(audio on, no meditation pose necessary!)
Used ChatGPT for an initial draft, and TorToiSe trained on only 30s of audio of Sam Harris
See 🧵 for implementation details
ChatGPT came up with some creative ideas, but the delivery was still fairly vanilla, so I iterated on it heavily and added a few Sam-isms from my experience with the @wakingup app (Jokes aside - highly recommended)
@wakingup Diffusion models & autoregressive transformers are coming for audio!