Agent57: Outperforming the Atari Human Benchmark | ✓ Link | 114736.26 | | Agent57 | 2020-03-30 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | ✓ Link | 49244.11 | | MuZero | 2019-11-19 |
Recurrent Experience Replay in Distributed Reinforcement Learning | ✓ Link | 39537.1 | | R2D2 | 2019-05-01 |
A Distributional Perspective on Reinforcement Learning | ✓ Link | 38874 | | C51 noop | 2017-07-21 |
GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning | | 38330 | | GDI-I3 | 2021-06-11 |
Generalized Data Distribution Iteration | | 38330 | | GDI-I3 | 2022-06-07 |
Generalized Data Distribution Iteration | | 38225 | | GDI-H3 | 2022-06-07 |
Online and Offline Reinforcement Learning by Planning with a Learned Model | ✓ Link | 37234.31 | | MuZero (Res2 Adam) | 2021-04-13 |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | ✓ Link | 33730.55 | | IMPALA (deep) | 2018-02-05 |
Self-Imitation Learning | ✓ Link | 33156.7 | | A2C + SIL | 2018-06-14 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 32464.1 | | A3C FF hs | 2016-02-04 |
Distributed Prioritized Experience Replay | ✓ Link | 31655.9 | | Ape-X | 2018-03-02 |
Noisy Networks for Exploration | ✓ Link | 31533 | | NoisyNet-Dueling | 2017-06-30 |
Fully Parameterized Quantile Function for Distributional Reinforcement Learning | ✓ Link | 30926.2 | | FQF | 2019-11-05 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 28889.5 | | A3C LSTM hs | 2016-02-04 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 28765.8 | | A3C FF (1 day) hs | 2016-02-04 |
Implicit Quantile Networks for Distributional Reinforcement Learning | ✓ Link | 28386 | | IQN | 2018-06-14 |
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity | ✓ Link | 26578.5 | | ASL DDQN | 2023-05-07 |
DNA: Proximal Policy Optimization with a Dual Network Architecture | ✓ Link | 24904 | | DNA | 2022-06-20 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 24788.86 | | Advantage Learning | 2015-12-15 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 24175.79 | | Persistent AL | 2015-12-15 |
Prioritized Experience Replay | ✓ Link | 23037.7 | | Prior noop | 2015-11-18 |
Mastering Atari with Discrete World Models | ✓ Link | 21868 | | DreamerV2 | 2020-10-05 |
Distributional Reinforcement Learning with Quantile Regression | ✓ Link | 21395 | | QR-DQN-1 | 2017-10-27 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 21036.5 | | Prior+Duel noop | 2015-11-20 |
Deep Exploration via Bootstrapped DQN | ✓ Link | 21021.3 | | Bootstrapped DQN | 2016-02-15 |
Prioritized Experience Replay | ✓ Link | 20889.9 | | Prior hs | 2015-11-18 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 20818.2 | | Duel noop | 2015-11-20 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 20437.8 | | DQN noop | 2015-09-22 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 20130.2 | | DDQN (tuned) noop | 2015-11-20 |
Human level control through deep reinforcement learning | ✓ Link | 19950 | | Nature DQN | 2015-02-25 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 15459.2 | | Prior+Duel hs | 2015-09-22 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 15207.9 | | Duel hs | 2015-11-20 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 14992.9 | | DQN hs | 2015-09-22 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 14892.5 | | DDQN (tuned) hs | 2015-09-22 |
Learning values across many orders of magnitude | | 14225.2 | | DDQN+Pop-Art noop | 2016-02-24 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 12859.5 | | UCT | 2012-07-19 |
Model-Free Episodic Control with State Aggregation | | 11732 | 13190 | MFEC | 2020-08-21 |
Massively Parallel Methods for Deep Reinforcement Learning | ✓ Link | 8963.4 | | Gorila | 2015-07-15 |
[]() | | 7295 | | SARSA | |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 6459 | | Best linear | 2012-07-19 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 6458.8 | | Best Learner | 2012-07-19 |
CURL: Contrastive Unsupervised Representations for Reinforcement Learning | ✓ Link | 6235.1 | | CURL | 2020-04-08 |
Evolving simple programs for playing Atari games | ✓ Link | 2974 | | CGP | 2018-06-14 |