language-modelling-on-one-billion-word

Language Modelling

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	PPL	Number of params	Validation perplexity	ModelName	ReleaseDate
Simple and Effective Masked Diffusion Language Models	✓ Link	20.09	110M		MDLM (AR baseline)	2024-06-11
OmniNet: Omnidirectional Representations from Transformers	✓ Link	21.5	100M		OmniNetT (Large)	2021-03-01
OmniNet: Omnidirectional Representations from Transformers	✓ Link	21.6	100M		OmniNetP (Large)	2021-03-01
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	✓ Link	21.8	0.8B		Transformer-XL Large	2019-01-09
OmniNet: Omnidirectional Representations from Transformers	✓ Link	22			OmniNetB (Large)	2021-03-01
Simple and Effective Masked Diffusion Language Models	✓ Link	23.00	110M		MDLM	2024-06-11
Adaptive Input Representations for Neural Language Modeling	✓ Link	23.02	1.0B	22.92	Adaptive Input Very Large	2018-09-28
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	✓ Link	23.5	0.46B		Transformer-XL Base	2019-01-09
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	✓ Link	23.5	465M		SRU++ Large	2021-02-24
Exploring the Limits of Language Modeling	✓ Link	23.7	43B		10 LSTM+CNN inputs + SNM10-SKIP (ensemble)	2016-02-07
Adaptive Input Representations for Neural Language Modeling	✓ Link	23.91	0.46B	23.83	Adaptive Input Large	2018-09-28
Mesh-TensorFlow: Deep Learning for Supercomputers	✓ Link	24.0	4.9B		Mesh Tensorflow	2018-11-05
[]()		25.06			Cohere Large
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	✓ Link	25.1	328M		SRU++	2021-02-24
Pay Less Attention with Lightweight and Dynamic Convolutions	✓ Link	26.67	0.34B		DynamicConv	2019-01-29
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	✓ Link	28.0	5B		High-Budget MoE	2017-01-23
The Evolved Transformer	✓ Link	28.6			Evolved Transformer Big	2019-01-30
Exploring the Limits of Language Modeling	✓ Link	30.0	1.04B		LSTM-8192-1024 + CNN Input	2016-02-07
Exploring the Limits of Language Modeling	✓ Link	30.6	1.8B		LSTM-8192-1024	2016-02-07
Language Modeling with Gated Convolutional Networks	✓ Link	31.9			GCNN-14 bottleneck	2016-12-23
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	✓ Link	34.1	5B		Low-Budget MoE	2017-01-23
Factorization tricks for LSTM networks	✓ Link	36.0			BIG G-LSTM-2	2017-03-31
Language Models are Unsupervised Multitask Learners	✓ Link	42.16	1.54B		GPT-2	2019-02-14
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling	✓ Link	51.3	20B		RNN-1024 + 9 Gram	2013-12-11
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation		52.9	33B		Sparse Non-Negative	2014-12-03
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences	✓ Link		53M	23.95	H-Transformer-1D Nr=16 (Base)	2021-07-25
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences	✓ Link		144M	20.25	H-Transformer-1D Nr=16 (Large)	2021-07-25

OpenCodePapers

language-modelling-on-one-billion-word