kalomaze Profile picture
ML researcher (@primeintellect), speculator • extremely silly jester
Sep 12, 2025 8 tweets 2 min read
"people only use these things to program because tooling sucks" is a pretty lazy and kind of disingenuous thing to say (on some level, though there's truth to it) to me "this is like when we invented spreadsheets" yeah except it's actually nothing at all like that bc the shape of the thing that we are dealing with is
a. INCREDIBLY amorphous by default
b. can be arbitrarily fit towards decision boundaries over *whatever you can correctly specify*
Jan 19, 2025 7 tweets 2 min read
it turns out

you can actually bias LLM finetuning in the direction you want for a metric if you can bound it between 1 and 0, and use that as a multiplier for the cross entropy loss.

implicitly, in an emergent way w/o hard penalties!

so, i can make SFT better on "hard tokens" Image the weighted token loss was used as a multiplier that encourages the CE loss to reduce in a way that will also improve the "high entropy" tokens, ("weighed_token_loss"). Image
Dec 20, 2024 7 tweets 2 min read
one time Claude 3 Opus spit out this youtube link, and if i look it up on Google, i see ~200 results total. as well as some uh, choice archives that Anthropic definitely scraped on Image i wonder how far n-gram tracebacks can go to get a general idea of what the training data looks like if i can trace back specific esoteric sources through a rare URL generation like this.