How to get URL link on X (Twitter) App
https://x.com/eliebakouch/status/2061880164498428188
some overview about the model shape. it's a 1T model with 35B active, trained on 33.5T tokens (30T pre-training, 3.55T mid-training).
the right part is very impressive, especially since the baseline is V3.2 and already using DSA which is supposed to be efficient at long context (you still had the O(L^2) in the indexer, the new attention likely solves this.
https://twitter.com/eliebakouch/status/2006345217965011009
thanks to @Presidentlin https://x.com/Presidentlin/status/2006347603915792783?s=20
@character_ai blog.character.ai/technical/insi…
@Meituan_LongCat relevant paper:
We plan to release a few more artefact in the coming days such as training logs, intermediate checkpoint (such as the mid trained ckpt) and checkpoint from ablation we did through the training.
huggingface.co/collections/Qw…



https://x.com/SeunghyunSEO7/status/1872292990863175989