Tweet

Sergey Levine

11 Dec, 19 tweets, 17 min read

@NeurIPSConf

My favorite part of @NeurIPSConf is the workshops, a chance to see new ideas and late-breaking work. Our lab will present a number of papers & talks at workshops:

thread below ->

meanwhile here is a teaser image :)

@katie_kang_

At robot learning workshop, @katie_kang_ will present the best-paper-winning (congrats!!) “Multi-Robot Deep Reinforcement Learning via Hierarchically Integrated Models”: how to share modules between multiple real robots; recording here: (16:45pm PT 12/11)

At the deep RL workshop, Ben Eysenbach will talk about how MaxEnt RL is provably robust to certain types of perturbations. Contributed talk at 14:00pm PT 12/11.
Paper: drive.google.com/file/d/1fENhHp…
Talk: slideslive.com/38941344/maxen…

Ben will also present C-Learning: a new algorithm for goal-conditioned learning that combines RL with principled training of predictive models. Deep RL poster session, 12:30 pm PT.
Paper: arxiv.org/abs/2011.08909
Website: ben-eysenbach.github.io/c_learning/
Talk: slideslive.com/38941367/clear…

Also at deep RL WS posters, Jensen Gao&@sidgreddy will present “XT2: Training an X-to-Text Typing Interface”: how deep RL can assist users to type via gaze and other interfaces, esp. for persons with disabilities.
Paper: drive.google.com/file/d/12f2P2b…
Talk: slideslive.com/38941310/xt2-t…

Ashvin Nair will present AWAC, offline RL with online finetuning, also at the deep RL WS poster session.
pres: slideslive.com/38941335/accel…
paper: arxiv.org/abs/2006.09359
blog: bair.berkeley.edu/blog/2020/09/1…

@timrudner

Also at deep RL WS posters, @timrudner @vitchyr will present “Outcome-Driven Reinforcement Learning,” describing how goal-conditioned RL can be derived in a principled way via variational inference.
Paper: timrudner.com/papers/Outcome…
Talk: slideslive.com/38941289/outco…

Also at deep RL WS, Aviral Kumar will present “Implicit Under-Parameterization” – our work on how TD learning can result in excessive aliasing due to rank collapse.
Paper: arxiv.org/abs/2010.14498
Video:

@snasiriany

Also at deep RL WS, @snasiriany and co-authors will present “DisCo RL”: RL conditioned on distributions, which provides much more expressivity than conditioning on goals.
Paper: snasiriany.me/files/disco_rl…
Presentation: slideslive.com/38941375/distr…
Poster: snasiriany.me/files/disco_rl…

@mmmbchang

At deepRL WS, @mmmbchang will present “Modularity in Reinforcement Learning: An Algorithmic Causality Perspective on Credit Assignment” how causal models help us understand transfer in RL!
Poster: bit.ly/2LjSelT
Paper: bit.ly/2KanyTK
Vid: bit.ly/3gxeMLp

@avisingh599

At deep RL WS, and as long oral presentation at offline RL WS, @avisingh599 will present COG: how offline RL can chain skills and acquire a kind of “common sense”
Vid:
Web: sites.google.com/view/cog-rl
Blog: bair.berkeley.edu/blog/2020/12/0…
Offline RL talk 12/12 9:50am

@avisingh599

At robot learning WS (8:45am PT poster) and real-world RL WS (12/12 11:20am poster), @avisingh599 will present PARROT: pre-training models that explore for diverse robotic skills.
Arxiv: arxiv.org/abs/2011.10024
Video:
Website: sites.google.com/view/parrot-rl

At meta-learning workshop, Marvin Zhang will present Adaptive Risk Minimization, how models can learn to adapt to distributional shifts at test time via meta-learning.
Paper: arxiv.org/abs/2007.02931
Pres: slideslive.com/38941545/adapt…

Enjoy all the NeurIPS workshops!!

@its_dibya

At deep RL WS, Abhishek Gupta & @its_dibya present GCSL (goal-conditioned supervised learning), a simple principled method to use supervised learning for RL!

Room B, B5, 1230-1330 & 18-19 PT
Paper arxiv.org/abs/1912.06088
Pres slideslive.com/38941275/learn…
Blog bair.berkeley.edu/blog/2020/10/1…

Also at deep RL WS, Abhishek Gupta & Kevin Lin will present BayCLR -- normalized maximum likelihood (NML) + meta-learning for setting goals!

Room D, C7, Deep RL Workshop, 12:30-1:30 and 6-7 PST

Paper: drive.google.com/file/d/1sd7nWn…
Slideslive: slideslive.com/38941398/reinf…

@_oleh

At DRL WS and robot learning WS, @_oleh, @chuning_zhu will present collocation-based planning for image-based model-based RL! By relaxing dynamics, robot images object "flying" to goal before figuring out how to move it

Video slideslive.com/38943304/latco…
paper drive.google.com/file/d/1zG9NxH…

At ML4Molecules (ml4molecules.github.io), Justin Fu will present Offline Model-Based Optimization via Normalized Maximum Likelihood (NEMO), for optimizing designs from data w/ NML! 8:30am ML4Molecules poster session

Paper: drive.google.com/file/d/1u-SC7O…
Poster: drive.google.com/file/d/133R0Aa…

At WS on challenges of real-world RL, B. Eysenbach will present DARC. Domain adaptation for RL: how to train RL agents in one domain, but have them pretend they are in another

Pres (1520 PT Sat 12/12) drive.google.com/file/d/1YwfWOv…
Paper arxiv.org/abs/2006.13916
blog.ml.cmu.edu/2020/07/31/mai…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svlevine

Sergey Levine

@svlevine

10 Dec

@NeurIPSConf

Tonight 12/10 9pm PT, Aviral Kumar will present Model Inversion Networks (MINs) at @NeurIPSConf. Offline model-based optimization (MBO) that uses data to optimize images, controllers and even protein sequences!

paper: tinyurl.com/mins-paper
pres: neurips.cc/virtual/2020/p…

more->

The problem setting: given samples (x,y) where x represents some input (e.g., protein sequence, image of a face, controller parameters) and y is some metric (e.g., how well x does at some task), find a new x* with the best y *without access to the true function*.

Classically, model-based optimization methods would learn some proxy function (acquisition function) fhat(x) = y, and then solve x* = argmax_x fhat(x), but this can result in OOD inputs to fhat(x) when x is very high dimensional.

Read 7 tweets

Sergey Levine

@svlevine

14 Oct

Greg Kahn's deep RL algorithms allows robots to navigation Berkeley's sidewalks! All the robot gets is a camera view, and supervision signal for when a safety driver told it to stop.

Website: sites.google.com/view/sidewalk-…
Arxiv: arxiv.org/abs/2010.04689
(more below)

The idea is simple: a person follows the robot in a "training" phase (could also watch remotely from the camera), and stops the robot when it does something undesirable -- much like a safety driver might stop an autonomous car.

The robot then tries to take those actions that are least likely to lead to disengagement. The result is a learned policy that can navigate hundreds of meters of Berkeley sidewalks entirely from raw images, without any SLAM, localization, etc., entirely using a learned neural net

Read 4 tweets

Sergey Levine

@svlevine

13 Oct

Can we view RL as supervised learning, but where we also "optimize" the data? New blog post by Ben, Aviral, and Abhishek: bair.berkeley.edu/blog/2020/10/1…

The idea: modify (reweight, resample, etc.) the data so that supervised regression onto actions produces better policies. More below:

Standard supervised learning is reliable and simple, but of course if we have random or bad data, supervised learning of policies (i.e., imitation) won't produce good results. However, a number of recently proposed algorithms can allow this procedure to work.

What is needed is to iteratively "modify" the data to make it more optimal than the previous iteration. One way to do this is by conditioning the policy on something about the data, such as a goal or even a total reward value.

Read 5 tweets

Sergey Levine

@svlevine

25 Jun

Interested in trying out offline RL? Justin Fu's blog post on designing a benchmark for offline RL, D4RL, is now up: bair.berkeley.edu/blog/2020/06/2…

D4RL is quickly becoming the most widely used benchmark for offline RL research! Check it out here: github.com/rail-berkeley/…

An important consideration in D4RL is that datasets for offline RL research should *not* just come from near-optimal policies obtained with other RL algorithms, because this is not representative of how we would use offline RL in the real world. D4RL has a few types of datasets..

"Stitching" data provides trajectories that do not actually accomplish the task, but the dataset contains trajectories that accomplish parts of a task. The offline RL method must stitch these together, attaining much higher reward than the best trial in the dataset.

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!