Oskar Hallström Profile picture
AI R&D Engineer @lightonio. Former Indie One Hit Wonder @ Billie Garlic
Dec 19, 2024 10 tweets 3 min read
torch.randperm isn't fully random and it can bias your trillion token-training run 😱

Today we're releasing ModernBERT, a new SOTA encoder-only model series. In this thread however, I'll share how torch.randperm (temporarily) put a wrench in the works (1/10) Image Background: In native PyTorch data samplers, torch.randperm is used to select the order of which the samples of a dataset should be retrieved when shuffling is used. torch.randperm(n) generates a random permutation of ints from 0 to n-1. (2/10)