| Dynamic Evaluation of Transformer Language Models | ✓ Link | 0.94 | 277M | Transformer-XL + RMS dynamic eval | 2019-04-17 |
| Compressive Transformers for Long-Range Sequence Modelling | ✓ Link | 0.97 | | Compressive Transformer | 2019-11-13 |
| Mogrifier LSTM | ✓ Link | 0.988 | 96M | Mogrifier LSTM + dynamic eval | 2019-09-04 |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | ✓ Link | 0.99 | 277M | 24-layer Transformer-XL | 2019-01-09 |
| Longformer: The Long-Document Transformer | ✓ Link | 0.99 | 102M | Longformer Large | 2020-04-10 |
| Longformer: The Long-Document Transformer | ✓ Link | 1.00 | 41M | Longformer Small | 2020-04-10 |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | ✓ Link | 1.03 | 88M | 18-layer Transformer-XL | 2019-01-09 |
| Character-Level Language Modeling with Deeper Self-Attention | ✓ Link | 1.06 | 235M | 64-layer Character Transformer Model | 2018-08-09 |
| Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | ✓ Link | 1.06 | 41M | 12-layer Transformer-XL | 2019-01-09 |
| Dynamic Evaluation of Neural Sequence Models | ✓ Link | 1.08 | 46M | mLSTM + dynamic eval | 2017-09-21 |
| Character-Level Language Modeling with Deeper Self-Attention | ✓ Link | 1.11 | 44M | 12-layer Character Transformer Model | 2018-08-09 |
| Mogrifier LSTM | ✓ Link | 1.122 | 96M | Mogrifier LSTM | 2019-09-04 |
| An Analysis of Neural Language Modeling at Multiple Scales | ✓ Link | 1.232 | 47M | 3-layer AWD-LSTM | 2018-03-22 |
| Multiplicative LSTM for sequence modelling | ✓ Link | 1.24 | 46M | Large mLSTM +emb +WN +VD | 2016-09-26 |
| Fast-Slow Recurrent Neural Networks | ✓ Link | 1.245 | 47M | Large FS-LSTM-4 | 2017-05-24 |
| Recurrent Highway Networks | ✓ Link | 1.27 | 46M | Large RHN | 2016-07-12 |
| Fast-Slow Recurrent Neural Networks | ✓ Link | 1.277 | 27M | FS-LSTM-4 | 2017-05-24 |
| Recurrent Highway Networks | ✓ Link | 1.31 | | RHN - depth 5 [zilly2016recurrent] | 2016-07-12 |