language-modelling-on-text8

Language Modelling

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Bit per Character (BPC)	Number of params	ModelName	ReleaseDate
Language Models are Unsupervised Multitask Learners	✓ Link	0.98	1542M	GPT-2	2019-02-14
Focus Your Attention (with Adaptive IIR Filters)		0.98	22M	Focus	2023-05-24
Dynamic Evaluation of Transformer Language Models	✓ Link	1.038	277M	Transformer-XL + RMS dynamic eval + decay	2019-04-17
Adaptive Attention Span in Transformers	✓ Link	1.07	209M	24L Transformer + 8K adaptive span	2019-05-19
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	✓ Link	1.08	277M	Transformer-XL - 24 layers	2019-01-09
Augmenting Self-attention with Persistent Memory	✓ Link	1.08	114M	All-attention network - 36 layers	2019-07-02
Long-Short Transformer: Efficient Transformers for Language and Vision	✓ Link	1.09		Transformer-LS (small)	2021-07-05
Adaptive Attention Span in Transformers	✓ Link	1.11	38M	12L Transformer + 8K adaptive span	2019-05-19
Augmenting Self-attention with Persistent Memory	✓ Link	1.11	38M	All-attention network - 18 layers	2019-07-02
BP-Transformer: Modelling Long-Range Context via Binary Partitioning	✓ Link	1.11		BP-Transformer - 12 Layers	2019-11-11
Character-Level Language Modeling with Deeper Self-Attention	✓ Link	1.13	235M	64-layer Character Transformer Model	2018-08-09
Recurrent Highway Networks with Grouped Auxiliary Memory	✓ Link	1.157	44.7M	GAM-RHN-10	2019-12-13
Character-Level Language Modeling with Deeper Self-Attention	✓ Link	1.18	44M	12-layer Character Transformer Model	2018-08-09
Pay Attention when Required	✓ Link	1.18		PAR Transformer 24B	2020-09-09
Dynamic Evaluation of Neural Sequence Models	✓ Link	1.19	45M	mLSTM + dynamic eval	2017-09-21
Discrete Flows: Invertible Generative Models of Discrete Data	✓ Link	1.23		Bipartite flows (8 flows)	2019-05-24
Recurrent Highway Networks	✓ Link	1.27	46M	Large RHN	2016-07-12
Multiplicative LSTM for sequence modelling	✓ Link	1.27	45M	Large mLSTM +emb +WN +VD	2016-09-26
Hierarchical Multiscale Recurrent Neural Networks	✓ Link	1.29	35M	LayerNorm HM-LSTM	2016-09-06
Recurrent Batch Normalization	✓ Link	1.36	16M	BN LSTM	2016-03-30
Multiplicative LSTM for sequence modelling	✓ Link	1.40	45M	Unregularised mLSTM	2016-09-26
Bayesian Flow Networks	✓ Link	1.41		BFN	2023-08-14
Architectural Complexity Measures of Recurrent Neural Networks		1.49		td-LSTM-large	2016-02-26
Architectural Complexity Measures of Recurrent Neural Networks		1.63		td-LSTM (Zhang et al., 2016)	2016-02-26

OpenCodePapers

language-modelling-on-text8