Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Justin Alvey

@justLV

Jul 18 • 8 tweets • 3 min read Twitter logo

Read on Twitter

I “jailbroke” a Google Nest Mini so that you can run your own LLM’s, agents and voice models.

Here’s a demo using it to manage all my messages (with help from @onbeeper)

🔊 on, and wait for surprise guest!

I thought hard about how to best tackle this and why, see 🧵

After looking into jailbreaking options, I opted to completely replace the PCB.

This let’s you use a cheap ($2) but powerful & developer friendly WiFi chip with a highly capable audio framework.

This allows a paradigm of multiple cheap edge devices for audio & voice detection…

& offloading large models to a more powerful local device (whether your M2 Mac, PC server w/ GPU or even "tinybox"!)

In most cases this device is already trusted with your credentials and data so you don’t have to hand these off to some cloud & data need never leave your home

The custom PCB uses @EspressifSystem's ESP32-S3

I went through 2 revisions from a module to a SoC package with extra flash, simplifying to single-sided SMT (< $10 BOM)

All features such as LED’s, capacitive touch, mute switch are working, & even programmable from Arduino (/IDF)

For this demo I used a custom “Maubot” with my @onbeeper credentials (a messaging app which securely bridges your messaging clients using the Matrix protocol & e2e encryption) which runs locally serving an API

I’m then using GPT3.5 (for speed) with function calling to query this

Fro the prompt I added details such as family & friends, current date, notification preferences & a list additional character voices that GPT can respond in.

The response is then parsed and sent to @elevenlabsio

I've been experimenting with multiple of these, announcing important messages as they come in, morning briefings, noting down ideas and memos, and browsing agents.

I couldn’t resist - here's a playful (unscripted!) video of two talking to each other prompted to be AI’s from "Her

I’m working on open sourcing the PCB design, build instructions, firmware, bot & server code - expect something in the next week or so.

If you don't want to source Nest Mini's (or shells from AliExpress) it's still a great dev platform for developing an assistant!

Stay tuned!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @justLV

Justin Alvey

@justLV

Mar 20

@LangChainAI

We are getting closer to “Her” (part 2!)

Conversationally do anything with emails, using LLM chaining & few-shot prompting for tool use (@LangChainAI inspired)

This is now realtime (ish), thanks to #OpenAI gpt-3.5-turbo

🔈 on for voice realism!

🧵

@LangChainAI

@LangChainAI This provides an incredibly natural way of searching for emails & then referencing them

“are there any unread emails mentioning X in the last month?”

“tell me more about the last one”

“who else was cc’d on the picnic one?”

“reply to the one about X saying …”

another e.g.:

https://twitter.com/justLV/status/1621253007492141056?s=20

Native iOS integration was done through Shortcuts, see this thread for how I did this previously

https://twitter.com/justLV/status/1621253007492141056?s=20

Read 8 tweets

Justin Alvey

@justLV

Feb 16

I wanted to imagine how we’d better use #stablediffusion for video content / AR.

A major obstacle, why most videos are so flickery, is lack of temporal & viewing angle consistency, so I experimented with an approach to fix this

See 🧵 for process & examples

Ideally you want to learn a single representation of an object across time or different viewing directions to perform a *single* #img2img generation on.

For this I used layered-neural-atlases.github.io (2021)

This learns an "atlas" to represent an object and its background across the video.

Regularization losses during training help preserve the original shape, with a result that resembles a usable slightly "unwrapped" version of the object

Read 6 tweets

Justin Alvey

@justLV

Feb 2

We are getting closer to “Her” where conversation is the new interface.

Siri couldn’t do it, so I built an e-mail summarizing feature using #GPT3 and life-like #AI generated voice on iOS.

(🔈Audio on to be 🤯with voice realism!)

How did I do this? 👇

@OpenAI

I used the Gmail API to feed in recent unread e-mails into a prompt and send to the @OpenAI #GPT3 Completion API. Calling out details such as not “just reading them out” and other prompt tweaks gave good results

@OpenAI

@OpenAI Here are the settings I used, you can see how #GPT3 does a great job of conversationally summarizing. (For the sake of privacy I made up the e-mails shown in the demo)

Read 5 tweets

Justin Alvey

@justLV

Jan 3

I used AI to create a (comedic) guided meditation for the New Year!

(audio on, no meditation pose necessary!)

Used ChatGPT for an initial draft, and TorToiSe trained on only 30s of audio of Sam Harris

See 🧵 for implementation details

@wakingup

ChatGPT came up with some creative ideas, but the delivery was still fairly vanilla, so I iterated on it heavily and added a few Sam-isms from my experience with the @wakingup app (Jokes aside - highly recommended)

@wakingup

@wakingup Diffusion models & autoregressive transformers are coming for audio!

Text-To-Speech was created using github.com/neonbjb/tortoi…

I also highly enjoyed reading the author's blog nonint.com

Read 5 tweets

Justin Alvey

@justLV

Dec 20, 2022

https://twitter.com/karenxcheng/status/1605276493675995138

I used the #StableDiffusion 2 Depth Guided model to create architecture photos from dollhouse furniture.

By using a depth-map you can create images with incredible spatial consistency without using any of the original RGB image.

See 🧵

https://twitter.com/karenxcheng/status/1605276493675995138

2/ This model is unique as it was fine-tuned from the Stable Diffusion 2 base with an extra channel for depth.

Using MiDaS (a model to predict depth from a single image), it can create new images with matching depth maps to your "init image"

3/ I set the denoising strength to 1.0 so that none of the original RGB image was used

Even with widely different prompts it was able to generate consistent objects

Using simple, recognizable shapes such as wooden doll-house furniture worked great for this

Read 8 tweets

Justin Alvey

@justLV

Nov 1, 2022

https://twitter.com/karenxcheng/status/1587510079770615809

1/ I created this with Stable Diffusion using image inpainting and “walking through the latent space”

Without using tweening, every frame is generated by an interpolated embedding and variable denoising strength, so keeping continuity was tricky

See 🧵for process

https://twitter.com/karenxcheng/status/1587510079770615809

2/ First off, finding the right combination of prompt, seed and denoising strength for an #img2img in-painting is a roll of the dice

Luckily it is easy to script large batches to cherrypick