Unsupervised Reinforcement Learning @ ICML 2021

Unsupervised learning (UL) has begun to deliver on its promise in the recent past with tremendous progress made in the fields of natural language processing and computer vision whereby large scale unsupervised pre-training has enabled fine-tuning to downstream supervised learning tasks with limited labeled data. This is particularly encouraging and appealing in the context of reinforcement learning considering that it is expensive to perform rollouts in the real world with annotations either in the form of reward signals or human demonstrations. We therefore believe that a workshop in the intersection of unsupervised and reinforcement learning (RL) is timely and we hope to bring together researchers with diverse views on how to make further progress in this exciting and open-ended subfield.

Important Dates

Paper Submission Deadline June 9, 2021 AoE
Decision Notifications June 30, 2021
Camera Ready Paper Deadline July 17, 2021 AoE
Workshop July 24, 2021

Call for Papers

We invite both short (4 page) and long (8 page) anonymized submissions in the ICML LaTeX format that study questions regarding the best ways of combining unsupervised learning with RL. More concretely, we welcome submissions around, but not necessarily limited to, the following broad questions:

  • How can the use of UL advance RL?
  • What are the most effective ways of combining UL with RL?
  • What are the settings in which UL can be most beneficial in RL?
  • How is Representation Learning for RL different from downstream supervised tasks?
  • What theoretical guarantees can be derived for unsupervised exploration and representation learning in RL?
  • How can UL improve RL in terms of sample efficiency, generalization, exploration?
  • How can UL and Skill Discovery be maximally synergetic?
  • How does the role of UL differ across Model-based RL, Model-free On-policy RL, Model-free Off-policy RL, Offline RL?
  • What inspirations can we take from cognitive science to bridge to inspire the next crop of UL methods for RL?
  • Is there a unified view to combine different UL methods into a single framework?

This workshop will bring together researchers working in unsupervised learning (including those in computer vision or natural language processing), representation learning and reinforcement learning to discuss the benefits, challenges and potential solutions for effectively using unsupervised learning techniques to enhance reinforcement learning agents. Early workshops were crucial to accelerate the use of UL techniques in vision and language, and we hope this workshop will serve as the kindling for UL techniques in RL.

Note that as per ICML guidelines, we don't accept works previously published in other conferences on machine learning, but are open to works that are currently under submission to a conference (such as NeurIPS 2021).

Submissions should be uploaded on OpenReview: URL submission link.

In case of any issues or questions, feel free to email the workshop organizers at: url.icml2021@gmail.com.


Pieter Abbeel
UC Berkeley
Kelsey Allen
Kiante Brantley
Maryland College Park
Chelsea Finn
David Ha
Google Brain
Danijar Hafner
University of Toronto
Yann LeCun


For decades unsupervised learning (UL) has promised to drastically reduce our reliance on supervision and reinforcement. Now, in the last couple of years, unsupervised learning has been delivering on this problem with substantial advances in computer vision (e.g., CPC [1], SimCLR [2], MoCo [3], BYOL [4]) and natural language processing (e.g., BERT [5], GPT-3 [6], T5 [7], Roberta [8]). The general purpose representations learned by unsupervised methods are useful for a variety of downstream supervised learning tasks, particularly in the low data regime (BERT [5], GPT-3 [6], T5 [7], CPCv2 [9], SimCLR [2], SimCLRv2 [10]).

However, in the context of reinforcement learning, we haven’t seen the level of impact UL has had in vision and language. This is not for the lack of trying. There has been a wide variety of methods developed by the Machine Learning community to use UL to make a meaningful impact in RL. A few prominent directions are as follows:

  • Learning rich representations of high dimensional observations to aid reinforcement learning (UNREAL [11], DARLA [12], TCN [13], SAC-AE [14], SLAC [15], CURL [16], DrQ [17], RAD [18], ATC [19], Bisimulation [20], Proto-RL [21]).
  • Building world models for planning (Visual MPC [22], Simple [23], PlaNet [24], Dreamer [25], MuZero [26], CFM [41]).
  • Learning to explore environments with sparse reward signals (EX2 [27], Curiosity [28], RND [29]).
  • Learning task agnostic, diverse and reusable skills (VIC [30], VALOR [31], DIAYN [32], DADS [33]).
  • Extracting signals for free with goal-conditioned and hindsight models (UVFA [34], HER [35], Asymmetric Self-Play [36], RIG [37], Learning From Play [38]).
  • Unsupervised Learning in the context of Meta/Multi-Task Learning (CARML [39], UML [40]).
  • Sample complexity bounds for unsupervised exploration and representation learning in RL (FLAMBE [42], BMDP [43], MaxEnt exploration [47], DisCO [44], reward free exploration [45], Francis [46]) .


Joelle Pineau
McGill University / Mila / FAIR
Aravind Srinivas
UC Berkeley
Denis Yarats
Amy Zhang
McGill University / Mila / FAIR


  1. Oord et al. "Representation Learning with Contrastive Predictive Coding." arXiv (2018).
  2. Chen et al. "A Simple Framework for Contrastive Learning of Visual Representations." ICML (2020).
  3. He et al. "Momentum Contrast for Unsupervised Visual Representation Learning." CVPR (2020).
  4. Grill et al. "Bootstrap your own latent: A new approach to self-supervised Learning". NeurIPS (2020).
  5. Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." NAACL 2019.
  6. OpenAI "Language Models are Few-Shot Learners." ArXiv (2020).
  7. Raffel et al. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer." ArXiv (2019).
  8. Lie et al. "RoBERTa: A Robustly Optimized BERT Pretraining Approach." ArXiv (2019).
  9. Hénaff et al. "Data-Efficient Image Recognition with Contrastive Predictive Coding." ArXiv (2019).
  10. Chen et al. "Big Self-Supervised Models are Strong Semi-Supervised Learners." NeurIPS (2020).
  11. Jaderberg et al. "Reinforcement Learning with Unsupervised Auxiliary Tasks." ICLR 2017.
  12. Higgins et al. "DARLA: Improving Zero-Shot Transfer in Reinforcement Learning." ICML 2017.
  13. Sermanet et al. "Time-Contrastive Networks: Self-Supervised Learning from Video." ArXiv 2017.
  14. Yarats et al. "Improving Sample Efficiency in Model-Free Reinforcement Learning from Images." AAAI (2021).
  15. Lee et al. "Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model." ArXiv (2019).
  16. Srinivas et al. "Contrastive Unsupervised Representations for Reinforcement Learning." ICML (2020).
  17. Yarats et al. "Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels." ICLR (2021).
  18. Laskin et al. "Reinforcement Learning with Augmented Data." NeurIPS (2020).
  19. Stook et al. "Decoupling Representation Learning from Reinforcement Learning." ArXiv (2020).
  20. Zhang et al. "Learning Invariant Representations for Reinforcement Learning without Reconstruction." ICLR (2021).
  21. Yarats et al. "Reinforcement Learning with Prototypical Representations." ArXiv (2021).
  22. Hirose et al. "Deep Visual MPC-Policy Learning for Navigation." ArXiv (2019).
  23. Kaiser et al. "Model-Based Reinforcement Learning for Atari." ArXiv (2019).
  24. Hafner et al. "Learning Latent Dynamics for Planning from Pixels." ICML (2019).
  25. Hafner et al. "Dream to Control: Learning Behaviors by Latent Imagination." ICLR (2020).
  26. Schrittwieser et al. "Mastering Atari, Go, chess and shogi by planning with a learned model." Nature (2020).
  27. Fu et al. "EX2: Exploration with Exemplar Models for Deep Reinforcement Learning." ArXiv (2017).
  28. Pathak et al. "Curiosity-driven Exploration by Self-supervised Prediction." ICML (2017).
  29. Burda et al. "Exploration by random network distillation." ICLR (2019).
  30. Gregor et al. "Variational Intrinsic Control." ArXiv (2016).
  31. Achiam et al. "Variational Option Discovery Algorithms." ArXiv (2018).
  32. Eysenbach et al. "Diversity is All You Need: Learning Skills without a Reward Function." ICLR (2019).
  33. Sharma et al. "Dynamics-Aware Unsupervised Discovery of Skills." ICLR (2020).
  34. Schaul et al. "Universal Value Function Approximators." ICML (2015).
  35. Andrychowicz et al. "Hindsight Experience Replay." NeurIPS (2017).
  36. Sukhbaatar et al. "Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play." ICLR (2018).
  37. Nair et al. "Visual Reinforcement Learning with Imagined Goals." NeurIPS (2018).
  38. Lynch et al. "Learning Latent Plans from Play." CoRL (2019).
  39. Jabri et al. "Unsupervised Curricula for Visual Meta-Reinforcement Learning." NeurIPS (2019).
  40. Gupta et al. "Unsupervised Meta-Learning for Reinforcement Learning." ICLR (2019).
  41. Yan et al. "Learning Predictive Representations for Deformable Objects Using Contrastive Estimation." CoRL (2020).
  42. Agarwal et al. "FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs." NeurIPS (2020).
  43. Feng et al. "Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning." NeurIPS (2020).
  44. Tarbouriech et al. "Improved Sample Complexity for Incremental Autonomous Exploration in MDPs." NeurIPS (2020).
  45. Jin et al. "Reward-Free Exploration for Reinforcement Learning." ArXiv (2020).
  46. Zanette et al. "Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration." NeurIPS (2020).
  47. Hazan et al. "Provably Efficient Maximum Entropy Exploration." ArXiv (2020).
Website theme inspired from the VIGIL workshop. Cover art by Matt Dixon