How to get URL link on X (Twitter) App
To test this, I wanted a set of 'in the wild' prompts that would reflect real world usage and not narrow code/STEM tasks - so I went to WildChat (the classic repo for this), grabbed one of the training parquet files, and chose 1000 random deduped prompts. I then ran these prompts through GPT-4o and Qwen3.5 4B at recommended sampling settings.
The training process is simple, and only takes 12 hourson an M3 Max. A simple LORA is applied to a quantized version of Qwen3-30B-A3B, which is trained to take in slopped stories and return humanlike outputs. I used 1000 training docs for this, for ~2.5M total tokens.
(i am math noob, so i won't try to explain this in a ton of depth) - but they make some really cool revelations - like showing how sft is just really simple RL: