Profile picture
Pierre-Yves Oudeyer @pyoudeyer
, 11 tweets, 7 min read Read on Twitter
How many random seeds are needed to compare #DeepRL algorithms?

Our new tutorial to address this key issue of #reproducibility in #reinforcementlearning

PDF: arxiv.org/pdf/1806.08295…

Code: github.com/flowersteam/rl…

Blog: openlab-flowers.inria.fr/t/how-many-ran…

#machinelearning #neuralnetworks
Algo1 and Algo2 are two famous #DeepRL algorithms, here tested
on the Half-Cheetah #opengym benchmark.

Many papers in the litterature compare using 4-5 random seeds,
like on this graph which suggests that Algo1 is best.

Is this really the case?
However, more robust statistical tests show there are no differences.

For a very good reason: Algo1 and Algo2 are both the same @openAI baseline
implementation of DDPG, same parameters!

This is what is called a "Type I error" in statistics.
Sometimes, using few random seeds shows no sign of one algorithm being
better or worse than another.

Here, DDPG with action perturb. vs DDPG with parameter perturb. with 5 seeds.
This apparent no-difference is a "Type II" error. Using more random seeds, and refined statistical tests, DDPG with parameter perturb. is actually robustly better than DDPG with action perturb.
The tutorial discusses the issue of how many random seeds are needed to compare algorithms, and which statistical method to use to assess the reliability of results.
Nothing is new in this tutorial, and these statistical methods are used widely in biology and physics. But we hope it will be useful!

What is surprising is how rarely they are used in #machinelearning ... which is about statistical learning!
If you see things to improve or update, all comments welcome!
You can use the blog to post questions/comments:
openlab-flowers.inria.fr/t/how-many-ran…
Last but not least, congrats to Cedric Colas for the outstanding work on this project!
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Pierre-Yves Oudeyer
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!