Profile picture
, 18 tweets, 4 min read Read on Twitter
What happened with #AlphaStar? Checkout out @DeepMindAI’s official blog here: deepmind.com/blog/alphastar… . I’ll be unpacking some stuff based on the information from the live stream and now blog post.
Context: After Deep Blue beat a chess grandmaster, Go - with its considerably larger action space and lack of heuristics - seemed to be the next big challenge. Like AlphaStar, the original AlphaGo used a combination of imitation learning and DRL to learn to play.
AlphaZero replaced imitation learning with self-play, but like its predecessor used tree search for training by rolling out a Go simulator. SC II is far too expensive to use in this fashion, so search during play or even expert iteration in training wouldn't work.
Particularly as SC II is real-time. It also has a more complex action space (considering one agent has to control hundreds of units, build buildings, etc.), and is partially-observed, so needs to realistically scout for information and retain this info.
As alluded to, there doesn't seem to be a "best" agent - analogous to rock-paper-scissors. So each agent has to have a good strategy (least exploitable), but can still lose to factors out of its control. It can do well on average, but still lose sometimes.
So firstly AlphaStar is boostrapped from imitation learning, as this provides a rich and varied training signal. Self-play from scratch in SC tends to get stuck in local, aggressive optima like rushing the enemy (e.g. Zerg rush).
The Protoss race is the opposite of the Zerg race in that it tends to favour longer-term strategy, so interesting choice here. Not sure if we can read much into it though.
Second is an evolutionary approach where a population of agents is maintained and trained against each other, thus providing a diversity of strategies to a) maintain and b) improve the agents. Less likely to fall into local minima.
Interestingly they also explicitly add another evolutionary approach - diversity - by giving agents slightly altered objectives. The “Lamarckian improvement” is a combination of existing DRL techniques, primarily IMPALA.
But they explicitly keep a set of the top agents rather than just one, knowing that there is no "best" - unlike other big AI vs. human matches before. Portfolio selection to choose between them online might be a next step here.
This last part builds explicitly on game-theoretic ideas like Nash equilibria, and it's nice to see this leap from just considering an agent vs. a static environment - the "ideal" RL scenario.
In contrast to DotA, I believe SC II has a more complex action space + planning. But although OpenAI Five seemed to have quite a few restrictions, AlphaStar could only play one race vs. itself on one map, and only 1v1! Progress on SC for sure, but maybe not a huge leap overall.
The network architecture uses some fancier parts like a Transformer, but nothing new. Still the most varied out of any previous DRL agents. Action space uses autoregression and a pointer head.
The original inputs included the whole map, potentially giving a greater ability to do inhuman micromanagement. Although context switching apparently was the same as humans, and this seems hard to measure, this may contribute to a lower APM.
A raw camera interface version doesn’t seem to quite match up to the original, though it seems unlikely to be the sole or even main cause of losing the final match. It may make certain policies – like very fine micro – very difficult though.
Finally, how does it play? Like OpenAI Five, somewhat non-human. Overall a lot of common ideas shared, but some missing, and some that humans don’t seem to play. In particular, bots can rely on much better micro to make up for a lack of long-term planning.
So far, none of the sheer creativity that humans display in RTS games. While humans learn quickly and may adapt, AlphaStar just keeps up the pressure, never letting up. Of course creativity is hard to judge objectively, but I wonder when we’ll ever see another “move 37”.
That said, this is still quite far for an agent that seems fairly homogenous, and doesn't seem to use an expertly configured hierarchy of specialised modules. This I think is one of the more impressive aspects of AlphaStar.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Kaixhin
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!