language-modelling-on-penn-treebank-word

Language Modelling

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Test perplexity	Validation perplexity	Params	ModelName	ReleaseDate
Language Models are Few-Shot Learners	✓ Link	20.5		175000M	GPT-3 (Zero-Shot)	2020-05-28
Language Models with Transformers	✓ Link	31.3	36.1	395M	BERT-Large-CAS	2019-04-20
Language Models are Unsupervised Multitask Learners	✓ Link	35.76		1542M	GPT-2	2019-02-14
Mogrifier LSTM	✓ Link	44.9	44.8	24M	Mogrifier LSTM + dynamic eval	2019-09-04
Improving Neural Language Modeling via Adversarial Training	✓ Link	46.01	46.63	22M	adversarial + AWD-LSTM-MoS + dynamic eval	2019-06-10
Gradual Learning of Recurrent Neural Networks	✓ Link	46.34	46.64	26M	GL-LWGC + AWD-MoS-LSTM + dynamic eval	2017-08-29
FRAGE: Frequency-Agnostic Word Representation	✓ Link	46.54	47.38	22M	FRAGE + AWD-LSTM-MoS + dynamic eval	2018-09-18
Direct Output Connection for a High-Rank Language Model	✓ Link	47.17	48.63	185M	AWD-LSTM-DOC x5	2018-08-30
Improved Language Modeling by Decoding the Past		47.3	48.0	22M	Past Decode Reg. + AWD-LSTM-MoS + dyn. eval.	2018-08-14
Advancing State of the Art in Language Modeling	✓ Link	47.31	48.92		Ensemble of All	2023-11-28
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model	✓ Link	47.69	48.33	22M	AWD-LSTM-MoS + dynamic eval	2017-11-10
Deep Residual Output Layers for Neural Language Generation	✓ Link	49.4	49.5	24M	AWD-LSTM-DRILL + dynamic eval	2019-05-14
Deep Independently Recurrent Neural Network (IndRNN)	✓ Link	50.97			Dense IndRNN+dynamic eval	2019-10-11
Dynamic Evaluation of Neural Sequence Models	✓ Link	51.1	51.6	24M	AWD-LSTM + dynamic eval	2017-09-21
Partially Shuffling the Training Data to Improve Language Models	✓ Link	52.0	53.79	23M	AWD-LSTM-DOC + Partial Shuffle	2019-03-11
Direct Output Connection for a High-Rank Language Model	✓ Link	52.38	54.12	23M	AWD-LSTM-DOC	2018-08-30
Regularizing and Optimizing LSTM Language Models	✓ Link	52.8	53.9	24M	AWD-LSTM + continuous cache pointer	2017-08-07
Partially Shuffling the Training Data to Improve Language Models	✓ Link	53.92	55.89	22M	AWD-LSTM-MoS + Partial Shuffle	2019-03-11
Trellis Networks for Sequence Modeling	✓ Link	54.19			Trellis Network	2018-10-15
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model	✓ Link	54.44	56.54	22M	AWD-LSTM-MoS	2017-11-10
Learning Associative Inference Using Fast Weight Memory	✓ Link	54.48	56.76	24M	AWD-FWM Schlag et al. (2020)	2020-11-16
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	✓ Link	54.55	56.72	24M	Transformer-XL	2019-01-09
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks	✓ Link	54.9	58.1		Transformer-XL + AutoDropout	2021-01-05
Pushing the bounds of dropout	✓ Link	55.3	57.1	24M	2-layer skip-LSTM + dropout tuning	2018-05-23
Deep Residual Output Layers for Neural Language Generation	✓ Link	55.7	58.2	24M	AWD-LSTM-DRILL	2019-05-14
DARTS: Differentiable Architecture Search	✓ Link	56.1	58.3	23M	Differentiable NAS	2018-06-24
Deep Independently Recurrent Neural Network (IndRNN)	✓ Link	56.37			Dense IndRNN	2019-10-11
Fraternal Dropout	✓ Link	56.8	58.9	24M	AWD-LSTM 3-layer with Fraternal dropout	2017-10-31
Deep Equilibrium Models	✓ Link	57.1		24M	DEQ-TrellisNet	2019-09-03
Regularizing and Optimizing LSTM Language Models	✓ Link	57.3	60.0	24M	AWD-LSTM	2017-08-07
Efficient Neural Architecture Search via Parameter Sharing	✓ Link	58.6	60.8	24M	Efficient NAS	2018-02-09
Neural Architecture Search with Reinforcement Learning	✓ Link	64.0		25M	NAS-RL	2016-11-05
Recurrent Highway Networks	✓ Link	65.4	67.9	23M	Recurrent highway networks	2016-07-12
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling	✓ Link	66.0	68.1		Inan et al. (2016) - Variational RHN	2016-11-04
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks	✓ Link	75.2	77.9		Gal & Ghahramani (2016) - Variational LSTM (large)	2015-12-16
Recurrent Neural Network Regularization	✓ Link	78.4	82.2		Zaremba et al. (2014) - LSTM (large)	2014-09-08
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling	✓ Link	78.93			LSTM (Bai et al., 2018)	2018-03-04
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks	✓ Link	79.7	81.9		Gal & Ghahramani (2016) - Variational LSTM (medium)	2015-12-16
Recurrent Neural Network Regularization	✓ Link	82.7	86.2		Zaremba et al. (2014) - LSTM (medium)	2014-09-08
R-Transformer: Recurrent Neural Network Enhanced Transformer	✓ Link	84.38			R-Transformer	2019-07-12
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling	✓ Link	92.48			GRU (Bai et al., 2018)	2018-03-04
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling	✓ Link	107.95		14.9M	Seq-U-Net	2019-11-14
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling	✓ Link	108.47		14.7M	TCN	2019-11-14

OpenCodePapers

language-modelling-on-penn-treebank-word