OpenCodePapers

language-modelling-on-text8

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBit per Character (BPC)Number of paramsModelNameReleaseDate
Language Models are Unsupervised Multitask Learners✓ Link0.981542MGPT-22019-02-14
Focus Your Attention (with Adaptive IIR Filters)0.9822MFocus2023-05-24
Dynamic Evaluation of Transformer Language Models✓ Link1.038277MTransformer-XL + RMS dynamic eval + decay2019-04-17
Adaptive Attention Span in Transformers✓ Link1.07209M24L Transformer + 8K adaptive span2019-05-19
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context✓ Link1.08277MTransformer-XL - 24 layers2019-01-09
Augmenting Self-attention with Persistent Memory✓ Link1.08114MAll-attention network - 36 layers2019-07-02
Long-Short Transformer: Efficient Transformers for Language and Vision✓ Link1.09Transformer-LS (small)2021-07-05
Adaptive Attention Span in Transformers✓ Link1.1138M12L Transformer + 8K adaptive span2019-05-19
Augmenting Self-attention with Persistent Memory✓ Link1.1138MAll-attention network - 18 layers2019-07-02
BP-Transformer: Modelling Long-Range Context via Binary Partitioning✓ Link1.11BP-Transformer - 12 Layers2019-11-11
Character-Level Language Modeling with Deeper Self-Attention✓ Link1.13235M64-layer Character Transformer Model2018-08-09
Recurrent Highway Networks with Grouped Auxiliary Memory✓ Link1.15744.7MGAM-RHN-102019-12-13
Character-Level Language Modeling with Deeper Self-Attention✓ Link1.1844M12-layer Character Transformer Model2018-08-09
Pay Attention when Required✓ Link1.18PAR Transformer 24B2020-09-09
Dynamic Evaluation of Neural Sequence Models✓ Link1.1945MmLSTM + dynamic eval2017-09-21
Discrete Flows: Invertible Generative Models of Discrete Data✓ Link1.23Bipartite flows (8 flows)2019-05-24
Recurrent Highway Networks✓ Link1.2746MLarge RHN2016-07-12
Multiplicative LSTM for sequence modelling✓ Link1.2745MLarge mLSTM +emb +WN +VD2016-09-26
Hierarchical Multiscale Recurrent Neural Networks✓ Link1.2935MLayerNorm HM-LSTM2016-09-06
Recurrent Batch Normalization✓ Link1.3616MBN LSTM2016-03-30
Multiplicative LSTM for sequence modelling✓ Link1.4045MUnregularised mLSTM2016-09-26
Bayesian Flow Networks✓ Link1.41BFN2023-08-14
Architectural Complexity Measures of Recurrent Neural Networks1.49td-LSTM-large2016-02-26
Architectural Complexity Measures of Recurrent Neural Networks1.63td-LSTM (Zhang et al., 2016)2016-02-26