Recurrent Experience Replay in Distributed Reinforcement Learning | ✓ Link | 999383.2 | R2D2 | 2019-05-01 |
Agent57: Outperforming the Atari Human Benchmark | ✓ Link | 992340.74 | Agent57 | 2020-03-30 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | ✓ Link | 981791.88 | MuZero | 2019-11-19 |
Generalized Data Distribution Iteration | | 978190 | GDI-H3 | 2022-06-07 |
A Distributional Perspective on Reinforcement Learning | ✓ Link | 949604.0 | C51 noop | 2017-07-21 |
Generalized Data Distribution Iteration | | 925830 | GDI-I3 | 2022-06-07 |
Noisy Networks for Exploration | ✓ Link | 870954 | NoisyNet-Dueling | 2017-06-30 |
Online and Offline Reinforcement Learning by Planning with a Learned Model | ✓ Link | 865543.44 | MuZero (Res2 Adam) | 2021-04-13 |
Deep Exploration via Bootstrapped DQN | ✓ Link | 811610 | Bootstrapped DQN | 2016-02-15 |
Distributional Reinforcement Learning with Quantile Regression | ✓ Link | 705662 | QR-DQN-1 | 2017-10-27 |
Implicit Quantile Networks for Distributional Reinforcement Learning | ✓ Link | 698045 | IQN | 2018-06-14 |
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity | ✓ Link | 626794 | ASL DDQN | 2023-05-07 |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | ✓ Link | 572898.27 | IMPALA (deep) | 2018-02-05 |
Distributed Prioritized Experience Replay | ✓ Link | 565163.2 | Ape-X | 2018-03-02 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 543504 | Advantage Learning | 2015-12-15 |
DNA: Proximal Policy Optimization with a Dual Network Architecture | ✓ Link | 505392 | DNA | 2022-06-20 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 479197.0 | Prior+Duel noop | 2015-11-20 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 470310.5 | A3C LSTM hs | 2016-02-04 |
Self-Imitation Learning | ✓ Link | 461522.4 | A2C + SIL | 2018-06-14 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 447408.6 | Prior+Duel hs | 2015-09-22 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 367823.7 | DDQN (tuned) hs | 2015-09-22 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 331628.1 | A3C FF hs | 2016-02-04 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 309941.9 | DDQN (tuned) noop | 2015-11-20 |
Prioritized Experience Replay | ✓ Link | 295972.8 | Prior hs | 2015-11-18 |
Prioritized Experience Replay | ✓ Link | 282007.3 | Prior noop | 2015-11-18 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 254748 | UCT | 2012-07-19 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 196760.4 | DQN noop | 2015-09-22 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 185852.6 | A3C FF (1 day) hs | 2016-02-04 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 154414.1 | DQN hs | 2015-09-22 |
Adaptive Rational Activations to Boost Deep Reinforcement Learning | ✓ Link | 149712 | Rational DQN Average | 2021-02-18 |
Massively Parallel Methods for Deep Reinforcement Learning | ✓ Link | 112093.4 | Gorila | 2015-07-15 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 110976.2 | Duel hs | 2015-11-20 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 98209.5 | Duel noop | 2015-11-20 |
Adaptive Rational Activations to Boost Deep Reinforcement Learning | ✓ Link | 86942 | Recurrent Rational DQN Average | 2021-02-18 |
Learning values across many orders of magnitude | | 56287.0 | DDQN+Pop-Art noop | 2016-02-24 |
Human level control through deep reinforcement learning | ✓ Link | 42684.0 | Nature DQN | 2015-02-25 |
Mastering Atari with Discrete World Models | ✓ Link | 41860 | DreamerV2 | 2020-10-05 |
Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization | ✓ Link | 37780.7 | POP3D | 2018-07-02 |
Evolving simple programs for playing Atari games | ✓ Link | 33752.4 | CGP | 2018-06-14 |
Evolution Strategies as a Scalable Alternative to Reinforcement Learning | ✓ Link | 22834.8 | ES FF (1 hour) noop | 2017-03-10 |
[]() | | 19761.0 | SARSA | |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 16871.3 | Best Learner | 2012-07-19 |