Shanmukha Vishnu Profile picture
I enjoy breaking things and Fixing them overnight from Android to Robots. 'Corrupt. Rebuild. Repeat'
May 1 7 tweets 2 min read
Just achieved 60 tokens/sec with Qwen3.6-35B-A3B (35B MoE) on RTX 4070 12GB
Full 128k context + Q4_K_M + running agents daily.
Here’s the complete step-by-step from scratch Image
Image
1. Build llama.cpp with CUDA

```
cd ~/llama.cpp
git pull
make clean
LLAMA_CUDA=1 make -j$(nproc)
cd build && make -j$(nproc)
```