Joel Becker Profile picture
move fast and fix things @METR_evals. 'soccer'-me @MessiSeconds.
May 26 8 tweets 2 min read
shortly after, @_sholtodouglas suggests governments should “[in order to understand whether automation of white-collar work] is about to happen, build swe-bench for all the other forms of white-collar work.”

thread on disagreements this might suggest: i think swe-bench is really not a good measure of whether software engineering is automated. saturation doesn't come close to implying that AI agents can be plug-in replacements for human software engineers.