The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:
1. Re Distillation claims: DeepSeekCoder-V2 [1] was released in June 2024, and they had RL on verifiable rewards working back then with great success. They were the only team outside of Gemini and OpenAI I knew of that were RL-pilled.
6 months from then on, I place a very low prior this team would just distill from o1 given how much info about o1 was in the OpenAI blog post itself. They may have used o1 CoTs as examples to seed their human CoT data, but again the RL team there is talented enough to do it on their own.
[1] arxiv.org/abs/2406.11931
2. The 5.5M$ thing is entirely believable for "one final run", and a massive feat. It is however only surprising if you compare with llama3 training costs which was a couple generations behind as Wenfeng mentions here. Obviously, the total R&D costs are much higher.