Unsupervised Reinforcement Learning @ ICML 2021

Friday, July 23, 2021

Unsupervised learning (UL) has begun to deliver on its promise in the recent past with tremendous progress made in the fields of natural language processing and computer vision whereby large scale unsupervised pre-training has enabled fine-tuning to downstream supervised learning tasks with limited labeled data. This is particularly encouraging and appealing in the context of reinforcement learning considering that it is expensive to perform rollouts in the real world with annotations either in the form of reward signals or human demonstrations. We therefore believe that a workshop in the intersection of unsupervised and reinforcement learning (RL) is timely and we hope to bring together researchers with diverse views on how to make further progress in this exciting and open-ended subfield.

Official schedule

All times listed below are in Eastern Time (ET). See the ICML virtual page for information about how to join these sessions.

Other important links:

08:45 - 09:00 AM Opening remarks
09:00 - 09:30 AM Invited talk by David Ha
09:30 - 10:00 AM Invited talk by Alessandro Lazaric
10:00 - 10:30 AM Invited Talk by Kelsey Allen
10:30 - 11:30 AM Coffee break and Poster Session
11:30 - 12:00 PM Invited Talk by Danijar Hafner
12:00 - 12:30 PM Invited Talk by Nan Rosemary Ke
12:30 - 1:30 PM Lunch and Poster Session
1:30 - 2:30 PM Oral Presentations
  • Discovering and Achieving Goals with World Models
    (Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak)
  • Planning from Pixels in Environments with Combinatorially Hard Search Spaces
    (Marco Bagatella, Miroslav Olšák, Michal Rolinek, Georg Martius)
  • Learning Task Agnostic Skills with Data-driven Guidance
    (Even Klemsdal, Sverre Herland, Abdulmajid Murad)
2:30 - 3:00 PM Invited Talk by Kianté Brantley
3:00 - 4:00 PM Coffee break and Poster Session
4:00 - 4:30 PM Invited Talk by Chelsea Finn
4:30 - 5:00 PM Invited Talk by Pieter Abbeel
5:00 - 5:30 PM Panel Discussion

Important Dates

Paper Submission Deadline June 9, 2021 AoE
Decision Notifications June 30, 2021
Camera Ready Paper Deadline July 17, 2021 AoE
Workshop July 23, 2021

Call for Papers

We invite both short (4 page) and long (8 page) anonymized submissions in the ICML LaTeX format that study questions regarding the best ways of combining unsupervised learning with RL. More concretely, we welcome submissions around, but not necessarily limited to, the following broad questions:

  • How can the use of UL advance RL?
  • What are the most effective ways of combining UL with RL?
  • What are the settings in which UL can be most beneficial in RL?
  • How is Representation Learning for RL different from downstream supervised tasks?
  • What theoretical guarantees can be derived for unsupervised exploration and representation learning in RL?
  • How can UL improve RL in terms of sample efficiency, generalization, exploration?
  • How can UL and Skill Discovery be maximally synergetic?
  • How does the role of UL differ across Model-based RL, Model-free On-policy RL, Model-free Off-policy RL, Offline RL?
  • What inspirations can we take from cognitive science to bridge to inspire the next crop of UL methods for RL?
  • Is there a unified view to combine different UL methods into a single framework?

This workshop will bring together researchers working in unsupervised learning (including those in computer vision or natural language processing), representation learning and reinforcement learning to discuss the benefits, challenges and potential solutions for effectively using unsupervised learning techniques to enhance reinforcement learning agents. Early workshops were crucial to accelerate the use of UL techniques in vision and language, and we hope this workshop will serve as the kindling for UL techniques in RL.

Note that as per ICML guidelines, we don't accept works previously published in other conferences on machine learning, but are open to works that are currently under submission to a conference (such as NeurIPS 2021).

Submissions should be uploaded on OpenReview: URL submission link.

In case of any issues or questions, feel free to email the workshop organizers at: url.icml2021@gmail.com.


Pieter Abbeel
UC Berkeley
Kelsey Allen
Kiante Brantley
Maryland College Park
Chelsea Finn
David Ha
Google Brain
Danijar Hafner
University of Toronto

Accepted papers

Camera-ready versions of all the papers are available on OpenReview

  1. [Oral] Planning from Pixels in Environments with Combinatorially Hard Search Spaces. Marco Bagatella, Miroslav Olšák, Michal Rolinek, Georg Martius.
  2. [Oral] Learning Task Agnostic Skills with Data-driven Guidance. Even Klemsdal, Sverre Herland, Abdulmajid Murad.
  3. [Oral] Discovering and Achieving Goals with World Models. Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak.
  4. [Poster] Did I do that? Blame as a means to identify controlled effects in reinforcement learning. Oriol Corcoll Andreu, Raul Vicente.
  5. [Poster] Visualizing MuZero Models. joery de Vries, Ken Voskuil, Thomas M. Moerland, Aske Plaat.
  6. [Poster] Pretrained Encoders are All You Need. Mina Khan, Advait Prashant Rane, Srivatsa P, Shriram Chenniappa, Rishabh Anand, Sherjil Ozair, Patricia Maes.
  7. [Poster] Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation. Nicklas Hansen, Hao Su, Xiaolong Wang.
  8. [Poster] The Importance of Non-Markovianity in Maximum State Entropy Exploration. Mirco Mutti, Riccardo De Santi, Marcello Restelli.
  9. [Poster] Reward-Free Policy Space Compression for Reinforcement Learning. Mirco Mutti, Stefano Del Col, Marcello Restelli.
  10. [Poster] Learning to Explore Multiple Environments without Rewards. Mirco Mutti, Mattia Mancassola, Marcello Restelli.
  11. [Poster] Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning. Víctor Campos, Pablo Sprechmann, Steven Stenberg Hansen, Andre Barreto, Steven Kapturowski, Alex Vitvitskyi, Adria Puigdomenech Badia, Charles Blundell.
  12. [Poster] Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching. Pierre-Alexandre Kamienny, Jean Tarbouriech, Alessandro Lazaric, Ludovic Denoyer.
  13. [Poster] Exploration and preference satisfaction trade-off in reward-free learning. Noor Sajid, Panagiotis Tigas, Alexey Zakharov, Zafeirios Fountas, Karl Friston.
  14. [Poster] Hierarchical Few-Shot Imitation with Skill Transition Models. Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin.
  15. [Poster] Explore and Control with Adversarial Surprise. Arnaud Fickinger, Natasha Jaques, Samyak Parajuli, Michael Chang, Nicholas Rhinehart, Glen Berseth, Stuart Russell, Sergey Levine.
  16. [Poster] MASAI: Multi-agent Summative Assessment Improvement for Unsupervised Environment Design. Yiping Wang, Michael Brandon Haworth.
  17. [Poster] Exploration via Empowerment Gain: Combining Novelty, Surprise and Learning Progress. Philip Becker-Ehmck, Maximilian Karl, Jan Peters, Patrick van der Smagt.
  18. [Poster] Learning Task-Relevant Representations with Selective Contrast for Reinforcement Learning in a Real-World Application. Flemming Brieger, Daniel Alexander Braun, Sascha Lange.
  19. [Poster] Unsupervised Skill-Discovery and Skill-Learning in Minecraft. Juan José Nieto, Roger Creus Castanyer, Xavier Giro-i-Nieto.
  20. [Poster] Reward is enough for convex MDPs. Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh.
  21. [Poster] Discovering Diverse Nearly Optimal Policies with Successor Features. Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Sebastian Flennerhag, Volodymyr Mnih, Satinder Singh.
  22. [Poster] Intrinsic Control of Variational Beliefs in Dynamic Partially-Observed Visual Environments. Nicholas Rhinehart, Jenny Wang, Glen Berseth, John D Co-Reyes, Danijar Hafner, Chelsea Finn, Sergey Levine.
  23. [Poster] Data-Efficient Exploration with Self Play for Atari. Michael Laskin, Catherine Cang, Ryan Rudes, Pieter Abbeel.
  24. [Poster] Learning to Represent State with Perceptual Schemata. Wilka Torrico Carvalho, Murray Shanahan.
  25. [Poster] Exploration-Driven Representation Learning in Reinforcement Learning. Akram Erraqabi, Mingde Zhao, Marlos C. Machado, Yoshua Bengio, Sainbayar Sukhbaatar, Ludovic Denoyer, Alessandro Lazaric.
  26. [Poster] Inverse Reinforcement Learning from Suboptimal Demonstrations. Andi Peng, Aviv Netanyahu, Pulkit Agrawal.
  27. [Poster] Reinforcement Learning as One Big Sequence Modeling Problem. Michael Janner, Qiyang Li, Sergey Levine.
  28. [Poster] Episodic Memory for Subjective-Timescale Models. Alexey Zakharov, Matthew Crosby, Zafeirios Fountas.
  29. [Poster] Decision Transformer: Reinforcement Learning via Sequence Modeling. Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
  30. [Poster] Visual Adversarial Imitation Learning using Variational Models. Rafael Rafailov, Tianhe Yu, Aravind Rajeswaran, Chelsea Finn.
  31. [Poster] Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation. Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon.
  32. [Poster] CoBERL: Contrastive BERT for Reinforcement Learning. Andrea Banino, Adria Puigdomenech Badia, Jacob C Walker, Tim Scholtes, Jovana Mitrovic, Charles Blundell.
  33. [Poster] SparseDice: Imitation Learning for Temporally Sparse Data via Regularization. Alberto Camacho, Izzeddin Gur, Marcin Lukasz Moczulski, Ofir Nachum, Aleksandra Faust.
  34. [Poster] Representation Learning for Out-of-distribution Generalization in Reinforcement Learning. Frederik Träuble, Andrea Dittadi, Manuel Wuthrich, Felix Widmaier, Peter Vincent Gehler, Ole Winther, Francesco Locatello, Olivier Bachem, Bernhard Schölkopf, Stefan Bauer.
  35. [Poster] Tangent Space Least Adaptive Clustering. James Buenfil, Samson J Koelle, Marina Meila.
  36. [Poster] Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning. Omar Darwiche Domingues, Corentin Tallec, Remi Munos, Michal Valko.
  37. [Poster] Decoupling Exploration and Exploitation in Reinforcement Learning. Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V Albrecht.
  38. [Poster] Disentangled Predictive Representation for Meta-Reinforcement Learning. Sephora Madjiheurem, Laura Toni.


For decades unsupervised learning (UL) has promised to drastically reduce our reliance on supervision and reinforcement. Now, in the last couple of years, unsupervised learning has been delivering on this problem with substantial advances in computer vision (e.g., CPC [1], SimCLR [2], MoCo [3], BYOL [4]) and natural language processing (e.g., BERT [5], GPT-3 [6], T5 [7], Roberta [8]). The general purpose representations learned by unsupervised methods are useful for a variety of downstream supervised learning tasks, particularly in the low data regime (BERT [5], GPT-3 [6], T5 [7], CPCv2 [9], SimCLR [2], SimCLRv2 [10]).

However, in the context of reinforcement learning, we haven’t seen the level of impact UL has had in vision and language. This is not for the lack of trying. There has been a wide variety of methods developed by the Machine Learning community to use UL to make a meaningful impact in RL. A few prominent directions are as follows:

  • Learning rich representations of high dimensional observations to aid reinforcement learning (UNREAL [11], DARLA [12], TCN [13], SAC-AE [14], SLAC [15], CURL [16], DrQ [17], RAD [18], ATC [19], Bisimulation [20], Proto-RL [21]).
  • Building world models for planning (Visual MPC [22], Simple [23], PlaNet [24], Dreamer [25], MuZero [26], CFM [41]).
  • Learning to explore environments with sparse reward signals (EX2 [27], Curiosity [28], RND [29]).
  • Learning task agnostic, diverse and reusable skills (VIC [30], VALOR [31], DIAYN [32], DADS [33]).
  • Extracting signals for free with goal-conditioned and hindsight models (UVFA [34], HER [35], Asymmetric Self-Play [36], RIG [37], Learning From Play [38]).
  • Unsupervised Learning in the context of Meta/Multi-Task Learning (CARML [39], UML [40]).
  • Sample complexity bounds for unsupervised exploration and representation learning in RL (FLAMBE [42], BMDP [43], MaxEnt exploration [47], DisCO [44], reward free exploration [45], Francis [46]) .


Joelle Pineau
McGill University / Mila / FAIR
Aravind Srinivas
UC Berkeley
Denis Yarats
Amy Zhang
McGill University / Mila / FAIR


  1. Oord et al. "Representation Learning with Contrastive Predictive Coding." arXiv (2018).
  2. Chen et al. "A Simple Framework for Contrastive Learning of Visual Representations." ICML (2020).
  3. He et al. "Momentum Contrast for Unsupervised Visual Representation Learning." CVPR (2020).
  4. Grill et al. "Bootstrap your own latent: A new approach to self-supervised Learning". NeurIPS (2020).
  5. Devlin et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." NAACL 2019.
  6. OpenAI "Language Models are Few-Shot Learners." ArXiv (2020).
  7. Raffel et al. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer." ArXiv (2019).
  8. Lie et al. "RoBERTa: A Robustly Optimized BERT Pretraining Approach." ArXiv (2019).
  9. Hénaff et al. "Data-Efficient Image Recognition with Contrastive Predictive Coding." ArXiv (2019).
  10. Chen et al. "Big Self-Supervised Models are Strong Semi-Supervised Learners." NeurIPS (2020).
  11. Jaderberg et al. "Reinforcement Learning with Unsupervised Auxiliary Tasks." ICLR 2017.
  12. Higgins et al. "DARLA: Improving Zero-Shot Transfer in Reinforcement Learning." ICML 2017.
  13. Sermanet et al. "Time-Contrastive Networks: Self-Supervised Learning from Video." ArXiv 2017.
  14. Yarats et al. "Improving Sample Efficiency in Model-Free Reinforcement Learning from Images." AAAI (2021).
  15. Lee et al. "Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model." ArXiv (2019).
  16. Srinivas et al. "Contrastive Unsupervised Representations for Reinforcement Learning." ICML (2020).
  17. Yarats et al. "Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels." ICLR (2021).
  18. Laskin et al. "Reinforcement Learning with Augmented Data." NeurIPS (2020).
  19. Stook et al. "Decoupling Representation Learning from Reinforcement Learning." ArXiv (2020).
  20. Zhang et al. "Learning Invariant Representations for Reinforcement Learning without Reconstruction." ICLR (2021).
  21. Yarats et al. "Reinforcement Learning with Prototypical Representations." ArXiv (2021).
  22. Hirose et al. "Deep Visual MPC-Policy Learning for Navigation." ArXiv (2019).
  23. Kaiser et al. "Model-Based Reinforcement Learning for Atari." ArXiv (2019).
  24. Hafner et al. "Learning Latent Dynamics for Planning from Pixels." ICML (2019).
  25. Hafner et al. "Dream to Control: Learning Behaviors by Latent Imagination." ICLR (2020).
  26. Schrittwieser et al. "Mastering Atari, Go, chess and shogi by planning with a learned model." Nature (2020).
  27. Fu et al. "EX2: Exploration with Exemplar Models for Deep Reinforcement Learning." ArXiv (2017).
  28. Pathak et al. "Curiosity-driven Exploration by Self-supervised Prediction." ICML (2017).
  29. Burda et al. "Exploration by random network distillation." ICLR (2019).
  30. Gregor et al. "Variational Intrinsic Control." ArXiv (2016).
  31. Achiam et al. "Variational Option Discovery Algorithms." ArXiv (2018).
  32. Eysenbach et al. "Diversity is All You Need: Learning Skills without a Reward Function." ICLR (2019).
  33. Sharma et al. "Dynamics-Aware Unsupervised Discovery of Skills." ICLR (2020).
  34. Schaul et al. "Universal Value Function Approximators." ICML (2015).
  35. Andrychowicz et al. "Hindsight Experience Replay." NeurIPS (2017).
  36. Sukhbaatar et al. "Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play." ICLR (2018).
  37. Nair et al. "Visual Reinforcement Learning with Imagined Goals." NeurIPS (2018).
  38. Lynch et al. "Learning Latent Plans from Play." CoRL (2019).
  39. Jabri et al. "Unsupervised Curricula for Visual Meta-Reinforcement Learning." NeurIPS (2019).
  40. Gupta et al. "Unsupervised Meta-Learning for Reinforcement Learning." ICLR (2019).
  41. Yan et al. "Learning Predictive Representations for Deformable Objects Using Contrastive Estimation." CoRL (2020).
  42. Agarwal et al. "FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs." NeurIPS (2020).
  43. Feng et al. "Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning." NeurIPS (2020).
  44. Tarbouriech et al. "Improved Sample Complexity for Incremental Autonomous Exploration in MDPs." NeurIPS (2020).
  45. Jin et al. "Reward-Free Exploration for Reinforcement Learning." ArXiv (2020).
  46. Zanette et al. "Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration." NeurIPS (2020).
  47. Hazan et al. "Provably Efficient Maximum Entropy Exploration." ArXiv (2020).
Website theme inspired from the VIGIL workshop. Cover art by Matt Dixon