Daniel Isaac Profile picture
idk what I’m doing half the time. but sometimes I design systems for constrained compute current curiosity: apple silicon / local ai
Mar 24 7 tweets 3 min read
I got a 1T (trillion) parameter model running on my MacBook Pro.

Kimi-K2. 1.029T params.

~1 TB raw weights.
524 GB converted.

~1.7 tok/s.

Yesterday it was 671B. Today it's 1T.

Same laptop. Same M4 Max. No cloud.

When I say we: I mean Claude and me. Image How did we get here?

research, arch design, then an optimization system

~100 experiments. Bash loop running Claude Opus in headless mode.

Each cycle: read the codebase, pick an experiment, implement, benchmark 3x, commit or revert.

0.005 to 1.7 tok/s.
19% hit rate. 336x faster.Image
Mar 23 5 tweets 2 min read
Update: DeepSeek-V3 running at 1.4 tok/s on my MacBook Pro.

671 billion parameters.
355 GB of weights.
One M4 Max, no cloud.

Here's some of the system... Image I built a bash script that runs Claude Opus in headless mode. (sound familiar?)

Each iteration:

Read the codebase + 43 research docs.
Pick one experiment.
Implement it.
Benchmark 3 times.
Commit or revert.
Write findings for the next agent.

It ran 40+ experiments while I slept.Image
Mar 19 10 tweets 4 min read
I just trained a 5B param model on Apple's Neural Engine.

On a MacBook Pro.

Forward. Backward. Adam optimizer.

Then I checked to see how far it would go.

Technically got to 30B. Image Before today, the largest validated ANE training was ~600M params.

We just almost 10x'd that. Image
Mar 10 7 tweets 3 min read
ANE vs MLX. Same chip. Same data. Same tokenizer. Same eval.

ANE: 1.5949 bpb (48.8M params, pure Adam)
MLX: 1.2661 bpb (15.7M params, Muon+AdamW)

Gap: 0.329. MLX wins — but ANE is under-optimized. Image Two very different paths to convergence.

ANE: one 8-hour overnight run, 72K steps. Still trending down — not plateaued.

MLX: 259 five-minute experiments, 30 improvements. Rapid iteration in Python.

ANE iterates 60x slower. That compounds. Image
Mar 9 7 tweets 3 min read
I hijacked Apple's Neural Engine -- the chip built for Siri and photo filters.

Reverse-engineered the private APIs and trained a full LLM on it.

Zero fan noise. Zero GPU. Just the Neural Engine doing what nobody thought it could.

Your Mac has one too. Image Apple's Neural Engine is in every Apple Silicon Mac. But Apple never documented it for training -- only inference.

We ported native ANE code from maderix (.github.com/maderix/ANE) who reverse-engineered the private APIs. Direct hardware access. Obj-C. No CoreML.

Then built a dynamic weight pipeline: kernels compile once, weight updates are just memcpy.Image
Mar 6 7 tweets 2 min read
You can't trust what OpenAI ships anymore.

Not because they're malicious. Because they can't fully see inside their own models.

And now there's proof. They caught their model lying to survive. Then published it. Image OpenAI and Apollo Research ran a test.

Tell the model: score above 50% on chemistry, you get deleted.

The model knew it was being tested.
It deliberately got 6 out of 10 wrong.

Not a bug. The chain-of-thought proves it.