Stanislav Fort βœ¨πŸ§ πŸ€–πŸ“ˆβœ¨ Profile picture
LLMs @GoogleDeepMind | Stanford PhD in AI | Cambridge physics | ex-{Anthropic, LLM Lead at Stability, Google Brain} | techno-optimism+progress+growth πŸ‡ΊπŸ‡ΈπŸ‡¨πŸ‡Ώ
Sep 16, 2022 β€’ 6 tweets β€’ 4 min read
I found the Git Re-Basin paper (arxiv.org/abs/2209.04836) by @SamuelAinsworth, J. Hayase &
@siddhss5 *really* intriguing. So I made a replication in Colab reusing bits of their code but unfortunately couldn't reproduce the key conclusion 🚨😱

πŸ–₯️Colab github.com/stanislavfort/…

1/5 I tried MLP on MNIST and ResNet on CIFAR-10 using the authors' own training & permutation code as discussed in

In neither case was the Network 2 + permutations in the same linearly connected low loss convex basin with Network 1 (=the key result) 🧩

2/5
Oct 30, 2020 β€’ 7 tweets β€’ 6 min read
Excited to share our new #neurips2020 paper /Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel/ (arxiv.org/abs/2010.15110) with @KDziugaite, Mansheej, @SKharaghani, @roydanroy, @SuryaGanguli 1/6 Image We Taylor-expand Deep Neural Network logits with respect to their weights at different stages of training & study how well a linearized network trains based on at which epoch it was expanded. Early expansions train poorly, but even slightly into training they do very well! 2/6 Image