First return, then explore | ✓ Link | 197376 | Go-Explore | 2020-04-27 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | ✓ Link | 85932.60 | MuZero | 2019-11-19 |
Agent57: Outperforming the Atari Human Benchmark | ✓ Link | 61507.83 | Agent57 | 2020-03-30 |
Distributed Prioritized Experience Replay | ✓ Link | 57196.7 | Ape-X | 2018-03-02 |
Recurrent Experience Replay in Distributed Reinforcement Learning | ✓ Link | 53318.7 | R2D2 | 2019-05-01 |
DNA: Proximal Policy Optimization with a Dual Network Architecture | ✓ Link | 19789 | DNA | 2022-06-20 |
Generalized Data Distribution Iteration | | 14649 | GDI-H3 | 2022-06-07 |
Fully Parameterized Quantile Function for Distributional Reinforcement Learning | ✓ Link | 12422.2 | FQF | 2019-11-05 |
GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning | | 7607 | GDI-I3 | 2021-06-11 |
Generalized Data Distribution Iteration | | 7607 | GDI-I3 | 2022-06-07 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 3409.0 | Prior+Duel noop | 2015-11-20 |
Distributional Reinforcement Learning with Quantile Regression | ✓ Link | 3117 | QR-DQN-1 | 2017-10-27 |
Online and Offline Reinforcement Learning by Planning with a Learned Model | ✓ Link | 2705.82 | MuZero (Res2 Adam) | 2021-04-13 |
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity | ✓ Link | 2597.2 | ASL DDQN | 2023-05-07 |
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning | | 2303.1 | Reactor 500M | 2017-04-15 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 2178.6 | Prior+Duel hs | 2015-09-22 |
Noisy Networks for Exploration | ✓ Link | 1896 | NoisyNet-Dueling | 2017-06-30 |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | ✓ Link | 1852.70 | IMPALA (deep) | 2018-02-05 |
A Distributional Perspective on Reinforcement Learning | ✓ Link | 1645.0 | C51 noop | 2017-07-21 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 1472.6 | Duel noop | 2015-11-20 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 1433.4 | A3C FF (1 day) hs | 2016-02-04 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 1328.25 | Persistent AL | 2015-12-15 |
Prioritized Experience Replay | ✓ Link | 1305.6 | Prior noop | 2015-11-18 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 1225.4 | DDQN (tuned) noop | 2015-11-20 |
Learning values across many orders of magnitude | | 1199.6 | DDQN+Pop-Art noop | 2016-02-24 |
Evolving simple programs for playing Atari games | ✓ Link | 1138 | CGP | 2018-06-14 |
Implicit Quantile Networks for Distributional Reinforcement Learning | ✓ Link | 1053 | IQN | 2018-06-14 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 1011.1 | DDQN (tuned) hs | 2015-09-22 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 910.6 | Duel hs | 2015-11-20 |
Prioritized Experience Replay | ✓ Link | 865.9 | Prior hs | 2015-11-18 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 862.2 | A3C LSTM hs | 2016-02-04 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 817.9 | A3C FF hs | 2016-02-04 |
Mastering Atari with Discrete World Models | ✓ Link | 810 | DreamerV2 | 2020-10-05 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 747.26 | Advantage Learning | 2015-12-15 |
Evolution Strategies as a Scalable Alternative to Reinforcement Learning | ✓ Link | 686.0 | ES FF (1 hour) noop | 2017-03-10 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 670 | Best Baseline | 2012-07-19 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 585.6 | DQN noop | 2015-09-22 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 501.3 | Best Learner | 2012-07-19 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 493.4 | DQN hs | 2015-09-22 |