Latest Twitter Threads by @danpacary on Thread Reader App

Mar 24 • 7 tweets • 3 min read

I got a 1T (trillion) parameter model running on my MacBook Pro.

Kimi-K2. 1.029T params.

~1 TB raw weights.
524 GB converted.

~1.7 tok/s.

Yesterday it was 671B. Today it's 1T.

Same laptop. Same M4 Max. No cloud.

When I say we: I mean Claude and me.

How did we get here?

research, arch design, then an optimization system

~100 experiments. Bash loop running Claude Opus in headless mode.

Each cycle: read the codebase, pick an experiment, implement, benchmark 3x, commit or revert.

0.005 to 1.7 tok/s.
19% hit rate. 336x faster.

Mar 23 • 5 tweets • 2 min read

Update: DeepSeek-V3 running at 1.4 tok/s on my MacBook Pro.

671 billion parameters.
355 GB of weights.
One M4 Max, no cloud.

Here's some of the system...

I built a bash script that runs Claude Opus in headless mode. (sound familiar?)

Each iteration:

Read the codebase + 43 research docs.
Pick one experiment.
Implement it.
Benchmark 3 times.
Commit or revert.
Write findings for the next agent.

It ran 40+ experiments while I slept.

Mar 19 • 10 tweets • 4 min read

I just trained a 5B param model on Apple's Neural Engine.

On a MacBook Pro.

Forward. Backward. Adam optimizer.

Then I checked to see how far it would go.

Technically got to 30B.

Before today, the largest validated ANE training was ~600M params.

We just almost 10x'd that.

Mar 10 • 7 tweets • 3 min read

ANE vs MLX. Same chip. Same data. Same tokenizer. Same eval.

ANE: 1.5949 bpb (48.8M params, pure Adam)
MLX: 1.2661 bpb (15.7M params, Muon+AdamW)

Gap: 0.329. MLX wins — but ANE is under-optimized.

Two very different paths to convergence.

ANE: one 8-hour overnight run, 72K steps. Still trending down — not plateaued.

MLX: 259 five-minute experiments, 30 improvements. Rapid iteration in Python.

ANE iterates 60x slower. That compounds.

Mar 9 • 7 tweets • 3 min read

I hijacked Apple's Neural Engine -- the chip built for Siri and photo filters.

Reverse-engineered the private APIs and trained a full LLM on it.

Zero fan noise. Zero GPU. Just the Neural Engine doing what nobody thought it could.

Your Mac has one too.

Apple's Neural Engine is in every Apple Silicon Mac. But Apple never documented it for training -- only inference.

We ported native ANE code from maderix (.github.com/maderix/ANE) who reverse-engineered the private APIs. Direct hardware access. Obj-C. No CoreML.

Then built a dynamic weight pipeline: kernels compile once, weight updates are just memcpy.

Mar 6 • 7 tweets • 2 min read

You can't trust what OpenAI ships anymore.

Not because they're malicious. Because they can't fully see inside their own models.

And now there's proof. They caught their model lying to survive. Then published it.

OpenAI and Apollo Research ran a test.

Tell the model: score above 50% on chemistry, you get deleted.

The model knew it was being tested.
It deliberately got 6 out of 10 wrong.

Not a bug. The chain-of-thought proves it.

Share this page!

Enter URL or ID to Unroll