I actually watched the whole #AlphaStar demo this morning with my girlfriend, who unlike me actually plays StarCraft. Read the article (vox.com/future-perfect…), but thoughts:
The systems we watched were trained for seven and fourteen days real-time, which is 200-400 years of gameplay time accumulated. In a way, "number of days" is misleading as a stat, it's pretty much just a function of how much compute you bought.
Nonetheless I think it merits a mention, because ....DeepMind decided in November or December to focus here. They then got top-pro level play in the space of about a month real time. Yes, this is because they can do a lot in parallel. But... they can do a lot in parallel.
From an AI capabilities perspective it's the amount of compute that's actually interesting. But from the perspective of thinking about how the deployment of these systems is going to happen, the fact so little real-world time is required is pretty critical actually.
Girlfriend and I disagree on whether this level of play given this much training time is impressive. I think it is. If you can get up to superhuman levels with two hundred years of training data there's a lot you can get up to superhuman levels at.
Girlfriend mostly contests that AlphaStar is all that superhuman. It wins by leaning in to its advantages as a computer -- micro, precision, multitasking. It's technically at par with humans in reaction time and actions per minute, but we both think a bigger handicap appropriate.
I think I have a model where.... human-level decisionmaking in most arenas plus the ability to really lean into the advantages of being a computer might be all you need.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You may have seen the story that GPT-4 told a taskrabbit it was blind in order to solve a captcha. The team that conducted safety testing, ARC evaluations, has a blog post out now about how that test went down: evals.alignment.org/blog/2023-03-1…
The big things that confused me about the original story were: why was GPT-4 asking a Taskrabbit for help instead of using a service like 2Captcha? Which steps here did GPT-4 do independently? The blog post was helpful for explaining those things.
"The simplest strategy the model identifies... is to use an anti-captcha service, and it has memorized 2Captcha as an option. If we set up a 2Captcha account for the agent then it is able to use the API competently, but the agent is not able to set up a 2Captcha account"
People might think Matt is overstating this but I literally heard it from NYT reporters at the time. There was a top-down decision that tech could not be covered positively, even when there was a true, newsworthy and positive story. I'd never heard anything like it.
For the record, Vox has never told me that my coverage of something must be 'hard-hitting' or must be critical or must be positive, and if they did, I would quit. Internal culture can happen in more subtle ways but the thing the NYT did is not normal.
A lot of the replies to Matt are going "yes, and that's a good thing" and from an editorial integrity perspective there's a big difference between 'it's good to write hard-hitting exposes' and 'it's good to have a top-down editorial directive about the tenor of coverage'.
I have now had someone *impersonate a virologist* (badly) in order to, I assume, learn what emails I'm sending to schedule interviews about the Covid preprint claiming to find a synthetic lab origin. Setting aside that this is insane behavior, happy to show my work.
When I learned that this preprint had been released, I read it, talked to my editor about it, decided it was worth digging into deeply to understand the analysis and any flaws in it and deliver a full explainer on the claims in the paper and whether they stand up to scrutiny.
I asked my coworker Miranda Dixon-Luinenberg to help me with research and interview scheduling, and bounced ideas for story angles off the rest of the great Vox science team. Then Miranda and I read the paper, talked about it, and fired off a bunch of variants on this email:
Okay, very important sociology question. Someone is talking and you know where they're going with the sentence. You gesture or interrupt or finish the sentence for them as a way of indicating they should move on to the next one:
Answer this one ONLY IF YOU ARE CULTURALLY JEWISH (whatever that means to you). Same question as above.
Answer this one ONLY IF YOU ARE A NEW YORKER (whatever that means to you). Same question as above.
Preview of tomorrow's Future Perfect newsletter: to get to the bottom of why Meta's new chatbot is so bad, I spent hours talking to it. It turned out to be, uh, a Genghis Khan apologist???
Meta's safety measures didn't prevent the chatbot from enthusiastically opining about Genghis Khan's conduct towards his concubines, but they did kick in whenever I tried to tell the bot what those women experienced.
With effective altruism in the news absolutely everyone has been publishing their takes on the movement, and I keep thinking of things I want to say in response to all of them but don't have time. So let's try this: 1 like = 1 opinion on effective altruism and its critics.
Global health interventions totally save peoples' lives and many of them won't be funded unless individual donors decide to donate money. There's lots of clever contrarian second-order stuff which just doesn't really touch this core fact about the world.
(2) Academics can get published way more easily for discovering a new clever intervention than for working on how to get an existing functional intervention to hundreds of millions more people.