Agent57: Outperforming the Atari Human Benchmark | ✓ Link | 580328.14 | | | Agent57 | 2020-03-30 |
Distributional Reinforcement Learning with Quantile Regression | ✓ Link | 572510 | | | QR-DQN-1 | 2017-10-27 |
Recurrent Experience Replay in Distributed Reinforcement Learning | ✓ Link | 408850.0 | | | R2D2 | 2019-05-01 |
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures | ✓ Link | 351200.12 | | | IMPALA (deep) | 2018-02-05 |
Distributed Prioritized Experience Replay | ✓ Link | 302391.3 | | | Ape-X | 2018-03-02 |
Self-Imitation Learning | ✓ Link | 104975.6 | | | A2C + SIL | 2018-06-14 |
Online and Offline Reinforcement Learning by Planning with a Learned Model | ✓ Link | 94906.25 | | | MuZero (Res2 Adam) | 2021-04-13 |
Mastering Atari with Discrete World Models | ✓ Link | 94688 | | | DreamerV2 | 2020-10-05 |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | ✓ Link | 72276.00 | | | MuZero | 2019-11-19 |
DNA: Proximal Policy Optimization with a Dual Network Architecture | ✓ Link | 52398 | | | DNA | 2022-06-20 |
Generalized Data Distribution Iteration | | 28657 | | | GDI-H3(200M frames) | 2022-06-07 |
Generalized Data Distribution Iteration | | 28657 | | | GDI-H3 | 2022-06-07 |
Generalized Data Distribution Iteration | | 27800 | | | GDI-I3 | 2022-06-07 |
GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning | | 27800 | | | GDI-I3 | 2021-06-11 |
Noisy Networks for Exploration | ✓ Link | 27121 | | | NoisyNet-Dueling | 2017-06-30 |
Implicit Quantile Networks for Distributional Reinforcement Learning | ✓ Link | 25750 | | | IQN | 2018-06-14 |
Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity | ✓ Link | 24548.8 | | | ASL DDQN | 2023-05-07 |
A Distributional Perspective on Reinforcement Learning | ✓ Link | 23784 | | | C51 noop | 2017-07-21 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 21307.5 | | | A3C LSTM hs | 2016-02-04 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 19220.3 | | | Duel noop | 2015-11-20 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 18760.3 | | | Prior+Duel noop | 2015-11-20 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 17343.4 | | | UCT | 2012-07-19 |
Prioritized Experience Replay | ✓ Link | 16256.5 | | | Prior noop | 2015-11-18 |
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models | ✓ Link | 15805 | | | MP-EB | 2015-07-03 |
Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization | ✓ Link | 15396.67 | | | POP3D | 2018-07-02 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 15148.8 | | | A3C FF hs | 2016-02-04 |
Deep Exploration via Bootstrapped DQN | ✓ Link | 15092.7 | | | Bootstrapped DQN | 2016-02-15 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 15088.5 | | | DDQN (tuned) noop | 2015-11-20 |
Value Prediction Network | ✓ Link | 14517 | | | VPN | 2017-07-11 |
Adaptive Rational Activations to Boost Deep Reinforcement Learning | ✓ Link | 14436 | | | Rational DQN Average | 2021-02-18 |
Increasing the Action Gap: New Operators for Reinforcement Learning | ✓ Link | 14368.03 | | | Advantage Learning | 2015-12-15 |
Dueling Network Architectures for Deep Reinforcement Learning | ✓ Link | 14175.8 | | | Duel hs | 2015-11-20 |
Model-Free Episodic Control with State Aggregation | | 14135 | 19750 | | MFEC | 2020-08-21 |
Adaptive Rational Activations to Boost Deep Reinforcement Learning | ✓ Link | 14080 | | | Recurrent Rational DQN Average | 2021-02-18 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 14063.0 | | | Prior+Duel hs | 2015-09-22 |
Asynchronous Methods for Deep Reinforcement Learning | ✓ Link | 13752.3 | | | A3C FF (1 day) hs | 2016-02-04 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 13117.3 | | | DQN noop | 2015-09-22 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 11020.8 | | | DDQN (tuned) hs | 2015-09-22 |
Human level control through deep reinforcement learning | ✓ Link | 10596 | | | Nature DQN | 2015-02-25 |
Prioritized Experience Replay | ✓ Link | 9944 | | | Prior hs | 2015-11-18 |
Deep Reinforcement Learning with Double Q-learning | ✓ Link | 9271.5 | | | DQN hs | 2015-09-22 |
Massively Parallel Methods for Deep Reinforcement Learning | ✓ Link | 7089.8 | | | Gorila | 2015-07-15 |
Learning values across many orders of magnitude | | 5236.8 | | | DDQN+Pop-Art noop | 2016-02-24 |
Playing Atari with Deep Reinforcement Learning | ✓ Link | 4500 | | | DQN Best | 2013-12-19 |
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings | ✓ Link | 4123.5 | | | Qbert Rainbow+SEER | 2021-03-04 |
Count-Based Exploration in Feature Space for Reinforcement Learning | ✓ Link | 4111.8 | | | Sarsa-φ-EB | 2017-06-25 |
Count-Based Exploration in Feature Space for Reinforcement Learning | ✓ Link | 3895.3 | | | Sarsa-ε | 2017-06-25 |
Playing Atari with Six Neurons | ✓ Link | 1250 | | | IDVQ + DRSC + XNES | 2018-06-04 |
CURL: Contrastive Unsupervised Representations for Reinforcement Learning | ✓ Link | 1225.6 | | | CURL | 2020-04-08 |
[]() | | 960.3 | | | SARSA | |
Evolving simple programs for playing Atari games | ✓ Link | 770 | | | CGP | 2018-06-14 |
The Arcade Learning Environment: An Evaluation Platform for General Agents | ✓ Link | 613.5 | | | Best Learner | 2012-07-19 |
Soft Actor-Critic for Discrete Action Settings | ✓ Link | 280.5 | | | SAC | 2019-10-16 |
Mean Actor Critic | ✓ Link | 243.4 | | | MAC | 2017-09-01 |
Evolution Strategies as a Scalable Alternative to Reinforcement Learning | ✓ Link | 147.5 | | | ES FF (1 hour) noop | 2017-03-10 |
Decision Transformer: Reinforcement Learning via Sequence Modeling | ✓ Link | 25.1 | | | DT | 2021-06-02 |
IQ-Learn: Inverse soft-Q Learning for Imitation | ✓ Link | | | 12940 | IQ-Learn | 2021-06-23 |