I've been thinking a bit about the growing practice of fine-tuning generic pretrained models: first in computer vision, now NLP (highly recommend @seb_ruder's great article on this ruder.io/nlp-imagenet/)...Last time I mentioned this, people were skeptical that RL would be next.
E.g. folks pointed out that RL is maybe the wrong level/domain of analysis, and maybe there is insufficient commonality across RL tasks people care about for this to make sense, etc... But in any case, I suspect more will happen in this vein beyond vision and NLP.
The simplest extrapolation would just be that, just as vision and NLP gradually matured to a point of being super commercially/creatively/etc. relevant over the course of many years, multimodal stuff combining vision+NLP might eventually reach that point + follow a similar path.
Remarkable how the taxonomy of research areas roughly resembles today - learning, planning, etc at high level, lower stuff like exploration/hierarchy, etc
Makes you think progress would have been vastly faster if those folks had today’s compute to try all this stuff out.