First return, then explore | ✓ Link | 43791 | Go-Explore | 2020-04-27 |
Go-Explore: a New Approach for Hard-Exploration Problems | ✓ Link | 43763 | Go-Explore | 2019-01-30 |
Self-supervised network distillation: an effective approach to exploration in sparse reward environments | ✓ Link | 21565 | SND-V | 2023-02-22 |
Agent57: Outperforming the Atari Human Benchmark | ✓ Link | 9352.01 | Agent57 | 2020-03-30 |
Exploration by Random Network Distillation | ✓ Link | 8152 | RND | 2018-10-30 |
Self-supervised network distillation: an effective approach to exploration in sparse reward environments | ✓ Link | 7838 | SND-VIC | 2023-02-22 |
Self-supervised network distillation: an effective approach to exploration in sparse reward environments | ✓ Link | 7212 | SND-STD | 2023-02-22 |
Contingency-Aware Exploration in Reinforcement Learning | | 6635 | A2C+CoEX | 2018-11-05 |
Count-Based Exploration with Neural Density Models | ✓ Link | 3705.5 | DQN-PixelCNN | 2017-03-03 |
Unifying Count-Based Exploration and Intrinsic Motivation | ✓ Link | 3459 | DDQN-PC | 2016-06-06 |
GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning | | 3000 | GDI-I3 | 2021-06-11 |
Generalized Data Distribution Iteration | | 3000 | GDI-I3 | 2022-06-07 |
Count-Based Exploration in Feature Space for Reinforcement Learning | ✓ Link | 2745.4 | Sarsa-φ-EB | 2017-06-25 |
Large-Scale Study of Curiosity-Driven Learning | ✓ Link | 2504.6 | Intrinsic Reward Agent | 2018-08-13 |
Distributed Prioritized Experience Replay | ✓ Link | 2500.0 | Ape-X | 2018-03-02 |
Online and Offline Reinforcement Learning by Planning with a Learned Model | ✓ Link | 2500 | MuZero (Res2 Adam) | 2021-04-13 |
Generalized Data Distribution Iteration | | 2500 | GDI-H3 | 2022-06-07 |
Recurrent Experience Replay in Distributed Reinforcement Learning | ✓ Link | 2061.3 | R2D2 | 2019-05-01 |
Count-Based Exploration with the Successor Representation | ✓ Link | 1778.8 | DQN+SR | 2018-07-31 |
Count-Based Exploration with the Successor Representation | ✓ Link | 1778.6 | DQNMMCe+SR | 2018-07-31 |
Self-Imitation Learning | ✓ Link | 1100 | A2C + SIL | 2018-06-14 |
Count-Based Exploration in Feature Space for Reinforcement Learning | ✓ Link | 399.5 | Sarsa-ε | 2017-06-25 |
Unifying Count-Based Exploration and Intrinsic Motivation | ✓ Link | 273.7 | A3C-CTS | 2016-06-06 |
[]() | | 259 | SARSA | |
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models | ✓ Link | 142 | MP-EB | 2015-07-03 |
Deep Exploration via Bootstrapped DQN | ✓ Link | 100 | Bootstrapped DQN | 2016-02-15 |
Massively Parallel Methods for Deep Reinforcement Learning | ✓ Link | 84 | Gorila | 2015-07-15 |
Mastering Atari with Discrete World Models | ✓ Link | 81 | DreamerV2 | 2020-10-05 |
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning | ✓ Link | 75 | TRPO-hash | 2016-11-15 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 67 | A3C FF hs | 2016-02-04 |
Noisy Networks for Exploration | ✓ Link | 57 | NoisyNet-Dueling | 2017-06-30 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 53 | A3C FF (1 day) hs | 2016-02-04 |
Prioritized Experience Replay | ✓ Link | 51 | Prior hs | 2015-11-18 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 47.0 | DQN hs | 2015-09-22 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 42.0 | DDQN (tuned) hs | 2015-09-22 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 41 | A3C LSTM hs | 2016-02-04 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 24.0 | Prior+Duel hs | 2015-09-22 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 22.0 | Duel hs | 2015-11-20 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 10.7 | Best Learner | 2012-07-19 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 1.72 | Persistent AL | 2015-12-15 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 0.42 | Advantage Learning | 2015-12-15 |
Implicit Quantile Networks for Distributional Reinforcement Learning | ✓ Link | 0 | IQN | 2018-06-14 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | ✓ Link | 0.00 | MuZero | 2019-11-19 |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | ✓ Link | 0.00 | IMPALA (deep) | 2018-02-05 |
Evolving simple programs for playing Atari games | ✓ Link | 0 | CGP | 2018-06-14 |
Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization | ✓ Link | 0 | POP3D | 2018-07-02 |
Distributional Reinforcement Learning with Quantile Regression | ✓ Link | 0 | QR-DQN-1 | 2017-10-27 |
DNA: Proximal Policy Optimization with a Dual Network Architecture | ✓ Link | 0 | DNA | 2022-06-20 |
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity | ✓ Link | 0 | ASL DDQN | 2023-05-07 |
Human level control through deep reinforcement learning | ✓ Link | 0 | Nature DQN | 2015-02-25 |