kache Profile picture
May 29, 2023 47 tweets 15 min read Read on X
I need a thing that listens to speech, and holds some state. this is an old project that I'm resurrecting
I will keep this thread updated Image
TIL about sampling rates, channels, etc
stdin works, saves file
setting up my devloop.
just need to figure out how to get ffmpeg to do the same so I don't have to record myself speaking
ideally the logic triggers on save with the whole "speech" Image
setting up the dev loop from a different source was a good exercise
subjecting myself to slowmo'd noisy sam hyde rant because I messed up bit depth (TIL)
I think i have a reasonably constrained interface & understand how audio data works
I will now think about eventing Image
Eventing depends on the decoding system
Naively, you can chunk and spread it across discrete TTS inferences
However, there are gotchyas
Whisper is trained on 30s chunks
You can have a running inference that keeps up
But maybe there is something faster
I will look at research Image
I spent like 3 hours looking at different whisper implementations / ways to run just to come to the conclusion:
wait for gregg to port to c++

every single python implementation abstracts way too much, and just ends up running like dogshit
ggml is WAY faster, and I get probs ez Image
i simply cannot be trusted with node dot js (the best runtime environment for hobby projects) and gpt4 teaching me how to do non standard things

(instead of having a server that I outbound to, I will call a cbinding) Image
2 hours later, I have figured out how node gyp works
👍
switching to mbook to see if it works across machines
I figured out how to link my cuda libraries in, so I can load the model on my GPU

this is the first time I've see node pop up on nvidia-smi. hahahahahaha

we'll see if i can get inference to work Image
ahahhhaha
the only reasonable thing to do to debug data handoff between node.js & cpp land?
writing the whole buffer to file; and checking md5, comparing to natty whispercpp
pythonistas...
it's over..
..i just got an inference through from a cbinding
(TIL how computers work) Image
WHAT DOES LITTLE ENDIAN MEAN????
LETS GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
@teleoflexuous also Alexa sucks
these are the times calling whisper from node.js on a macbook pro max
had to figure out how to compile with coreML and link it to node binding
220 ms per transcription 🤩 Image
back to it
got it going from my mic Image
NEXT WE ARE MAKING IT THINK Image
LFG!
found some code that has bindings to llama for node
it was written in rust. did not wake up thinking i would be reading rust today

Thank you @hlhr202! https://t.co/fZA5ttpIYgtwitter.com/i/web/status/1…
Image
@hlhr202 token go brrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr 🤩
my node process says hello
@hlhr202 okay breaks over
lets see what kind of goofy shit we can accomplish now
I want to have some kind of scheduler that doesn't fry gpu Image
@hlhr202 okay, I gotta throw the towel in
i tried thinking about scheduler but brain is mush

it is pretty wild that my computer is literally thinking thoughts as I speak to it though. the future is now!
@hlhr202 oh ho ho, what do we have here, an unfinished project?

okay, the next step is ripping out the dependency and writing it myself (the llama binding pkg i found) (I do not understand rust nor can i debug it, so I will rawdog the c bindings again)

also it will run on macbook by eod Image
@hlhr202 me, looking at the GPT4 generated c plus plus I "wrote" a week ago at a cottage bachelor party 2 beers deep Image
@hlhr202 it might not look like much but i figured out how to get an event emitter from c plus plus without blocking the main thread

i also know how memory management and threading works on computers a little bit better
@hlhr202 yeah im cookin Image
@hlhr202 LETS GO!

- Init function from node.js which instantiates llama parameters
- Then; startAsync which starts a non blocking inference off of the node.js main worker thread, which calls an event listener on every new token

genuinely can't believe i did it lmao https://t.co/RjvS6DKwevtwitter.com/i/web/status/1…
@jpmaney
here it is :)


still needs a bit of cleaning, I'll give it another pass later before I point the diff at the master origin

but it should be runnable now

Progress update: Added locks, because later, I want to do goofy things from my caller node https://t.co/syu3ifBGtxgithub.com/yacineMTB/llam…
Image
@jpmaney 2 hours later i got it working on my linux machine as well

i now know how cMakeLists.txt works :|
never give up
thanks gpt4 for the help Image
@jpmaney I redid the build system for whisper.cpp to use the same system i'm using for llama cpp
works
also made it actually async Image
@jpmaney okay, this is running on my macbook on battery

Basically, my laptop is thinking thoughts while it listens to a yt video

It's still sync, waits for whisper, then does llama inference. But now that i have my grubby hands on it I can do little tricks to make it fast :)

gn
@jpmaney ok
i'm focused
I have three hours
today's the day
it's time to clutch it Image
@jpmaney Picking up an old design i had
A strategy to minimize latency on responses Image
@jpmaney also here's the "architecture" Image
@jpmaney Running the inferences on whisper.cpp & llama.cpp at the same time. This works pretty well!

Using @Teknium1's hermes-13b model

audio warning - tried to time it honestly to give a feel for the latency. Next; I'm going to start testing me actually talking to it
@jpmaney @Teknium1 Image
@jpmaney @Teknium1 Image
@jpmaney @Teknium1 Image
@jpmaney @Teknium1 Diff in

didn't get to the code that chooses when to have the bot begin responding, which I'm calling the "speech reflex" event (right now i'ts just a hotkey press)

Also created issues
https://t.co/AQlH1nwKj0
https://t.co/YXSgRX8eyngithub.com/yacineMTB/talk…
github.com/yacineMTB/talk…
github.com/yacineMTB/talk…
@jpmaney @Teknium1 okay, I've decided
I will be using synesthesiam's mimicv3 and
thank you for the contribution <3github.com/rhasspy/piper
@jpmaney @Teknium1 I wrote an audio player manager in ts, that spawns a script

Eventually, will replace with binding, but I don't have time for that right now. I should write the binding because most of the inference time (80% by my estimation) is loading the model weights

video demo of audio gen
@jpmaney @Teknium1 I've got a good conversational state tracker.

The speed of the TTS + llama makes it such that chunking out things greedily from token stream makes it extremely fast. See latency between keeb press and "well" being voiced

Therefore will avoid using "precanned thoughts"
@jpmaney @Teknium1 Changing strategy, ditched the node bindings on llamacpp (getting seg fault and don't have time to debug), running server as side car instead.
It was a good effort! I understand the forward pass better now

Really fast with new llama cuda pr. like really, really, really fast
@jpmaney @Teknium1 mulling over architecture
I know I want events
However; something feels wrong about the state that I am carrying forward Image
@jpmaney @Teknium1 I think what felt wrong is the event generating isn't that clear
This is better; if everything is constant, an event log generated from some buffer should be deterministic
I have been guided in the direction of designing a finite state machine, which maybe is a DAG Image
@jpmaney @Teknium1 Image
@jpmaney @Teknium1 btw i finished this and it works exceptionally better
@jpmaney @Teknium1 Here's a demo! Reworked with an event based architecture shared earlier. Some of the events are logging through too!

It's getting uncanny :)


Still have some weird state loop things, I'll give it another edit cycle at some point this week
@cto_junior https://t.co/4ODL9ph2UUgithub.com/yacineMTB/talk…
@ggerganov @jpmaney @Teknium1 @cto_junior I've actually learned a ton just walking through the cpp code the past few weeks, looking at LoRA, etc. It's all demystified when it's just code you're staring at. The big secret is that it's really actually kind of simple!
@moonares @jpmaney @Teknium1 @cto_junior the reason i'm building this is because I"m trying to higher bitrate interface with my computer
eventually, I want my whole screen captured, and provided as context to the godlike QA agent Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with kache

kache Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yacineMTB

Nov 16, 2025
honestly I don't like Indian culture at all. I just don't get it. Not, like the lore. That's awesome. Or Indians outside of India. But anytime I interact with Indians (from the Indian subcontinent) online I just.. don't get it. The behavior is just default performative
Like chinese? Love them. Israelis? I get it. Brazilians? Kkk huehuehue. Russians? Kotl give me mana. But Indians? They just really annoy me. Like I know you're not supposed to judge people by their group but.. lame. Not Indians outside of India but like Indians from india
I realize this is an insane thing to tweet but i get why so many honest hard working Indian folks really, really want to leave India
Read 5 tweets
Sep 15, 2025
I'm really good at what I do. One of the things about becoming actually, truly, really good is that you can parse out when other people are good. Strangely, I learned that age, experience, education, has nothing to do with being really good

it's just all over the place
talent, talent is all over the place. i can spot a guy from a foreign country, from the pattern of his posts, his profile, the way he understands things. it has nothing to do with anything but just talent. its clarity
my two biggest formative moments were
- working with an experienced principal eng, and realizing there was nothing different between me and him
- working with an intern that had clarity of thought, and realizing there was nothing different between me and her
Read 5 tweets
Jul 3, 2025
I'm benefiting greatly from google's open source software yet again, and honestly, I'm not even sure it's a good idea for them to open source this much of the sauce. The value that I can just.. git clone into my server.. it's astounding. It's worth millions of dollars to me
I don't mean "millions of dollars" in a hypothetical sense. I mean it literally. Because of open source code that google has put out, I will make millions of dollars. It would have been possible without, but it would have taken me 3 months to get to where i am instead of 1 week
This could be hyperbole might be a smaller version of one of those transformer moments. It takes time for the bureaucratic mass to understand what is extremely valuable, and that time is much longer than it takes for the researchers to just open source it
Read 5 tweets
Jun 21, 2025
I got fired today. I'm not sure why, I personally don't think there is a reason, or that it's important.

When I joined twitter, I joined because of the engineers I met in SF. They seemed happy. They were having fun. Engineers at play. Engineers that were enabled. It was good!
They seemed competent. They spoke clearly. They didn't make things up. They told me why they worked there. One of them said: "this is the only place where I can work with this scale"

The scale. The scale is just absurd. 1m qps shit makes your eyes bleed. Pagers, pagers, pagers!
I was ambivalent to joining before I visited. I had dingboard, and it was growing fast. For me, it was a little adventure. But after meeting those engineers, I wanted to go back.

You can take the boy out of big tech
But its hard to take the big tech out of the boy
Read 31 tweets
Mar 2, 2025
i actually don't think you could cheat the interview i give with AI. like it's laughably easy; it's something that you would have programmed yourself if you ever needed to write a tool to make a chart of your CC transactions

yet, my interviews screen out *a lot* of people
the point of a screen is an "are you alive" test and its actually pretty clear within 5 minutes of me going through it

in fact i'd say the more leetcode you do the more likely you are going to fail my screen. being overpracticed is the same as cheating
the truth is that most google programming interviews are laughably easy and are just testing whether you cheated your way through your CS degree. it's abundantly obvious when people do, no amount of "tools" will stop it

at some point we lost the plot and started LC inflation
Read 7 tweets
Feb 25, 2025
I'm going to keep this thread bumped, comparing grok 3 and claude sonnet 37. I pinky promise i won't be biased.

The sample of the questions will not be "do some code work for me", but rather, explain something technical to me.

It will be a simple point scoring system, by eod
grok 3 gets a point. score is 1 to grok and 0 to sonnet

It was able to explain lazy monadic computation graphs, comparing two examples

claude 3.7 hallucinated / missed the problem (or i didnt understand its explanation which is also not good)
actually did have a benign one off script i needed to clean up some data, claude got the point. grok made a mistake on an import

1 to grok and 1 to sonnet
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(