OpenCodePapers

language-modelling-on-wikitext-2

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTest perplexityValidation perplexityNumber of paramsModelNameReleaseDate
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link8.21SparseGPT (175B, 50% Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link8.34OPT-175B2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link8.45SparseGPT (175B, 4:8 Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link8.73SparseGPT (175B, 2:4 Sparsity)2023-01-02
Hydra: A System for Large Multi-Model Deep Learning✓ Link15.1715.691542MGPT-2 (fine-tuned)2021-10-16
Language Models are Unsupervised Multitask Learners✓ Link18.341542MGPT-22019-02-14
Language Models are Unsupervised Multitask Learners✓ Link19.93762MGPT-2 (large)2019-02-14
Language Models are Unsupervised Multitask Learners✓ Link22.76345MGPT-2 (medium)2019-02-14
Language Models are Unsupervised Multitask Learners✓ Link29.41117MGPT-2 (small)2019-02-14
Language Models with Transformers✓ Link34.137.7395MBERT-Large-CAS2019-04-20
Mogrifier LSTM✓ Link38.640.235MMogrifier LSTM + dynamic eval2019-09-04
Improving Neural Language Modeling via Adversarial Training✓ Link38.6540.2735Madversarial + AWD-LSTM-MoS + dynamic eval2019-06-10
FRAGE: Frequency-Agnostic Word Representation✓ Link39.1440.8535MFRAGE + AWD-LSTM-MoS + dynamic eval2018-09-18
Improved Language Modeling by Decoding the Past40.342.035MPast Decode Reg. + AWD-LSTM-MoS + dyn. eval.2018-08-14
Gradual Learning of Recurrent Neural Networks✓ Link40.4642.1938MGL-LWGC + AWD-MoS-LSTM + dynamic eval2017-08-29
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model✓ Link40.6842.4135MAWD-LSTM-MoS + dynamic eval2017-11-10
Deep Residual Output Layers for Neural Language Generation✓ Link42.043.934MAWD-LSTM-DRILL + dynamic eval2019-05-14
Dynamic Evaluation of Neural Sequence Models✓ Link44.346.433MAWD-LSTM + dynamic eval2017-09-21
Regularizing and Optimizing LSTM Language Models✓ Link52.053.833MAWD-LSTM + continuous cache pointer2017-08-07
Direct Output Connection for a High-Rank Language Model✓ Link53.0954.19185MAWD-LSTM-DOC x52018-08-30
Advancing State of the Art in Language Modeling✓ Link53.7355.4Ensemble of All2023-11-28
Mogrifier LSTM✓ Link55.157.335MMogrifier LSTM2019-09-04
Partially Shuffling the Training Data to Improve Language Models✓ Link57.8560.1637MAWD-LSTM-DOC + Partial Shuffle2019-03-11
Direct Output Connection for a High-Rank Language Model✓ Link58.0360.2937MAWD-LSTM-DOC2018-08-30
Partially Shuffling the Training Data to Improve Language Models✓ Link59.9862.3835MAWD-LSTM-MoS + Partial Shuffle2019-03-11
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model✓ Link61.4563.8835MAWD-LSTM-MoS2017-11-10
Learning Associative Inference Using Fast Weight Memory✓ Link61.6554.4837MAWD-FWM Schlag et al. (2020)2020-11-16
Deep Residual Output Layers for Neural Language Generation✓ Link61.964.934MAWD-LSTM-DRILL2019-05-14
Fraternal Dropout✓ Link64.166.834MAWD-LSTM 3-layer with Fraternal dropout2017-10-31
Alleviating Sequence Information Loss with Data Overlapping and Prime Batch Sizes✓ Link64.7367.4733MAWD-LSTM + ATOI2019-09-18
Regularizing and Optimizing LSTM Language Models✓ Link65.868.633MAWD-LSTM2017-08-07
On the State of the Art of Evaluation in Neural Language Models✓ Link65.969.324MMelis et al. (2017) - 1-layer LSTM (tied)2017-07-18
Improving Neural Language Models with a Continuous Cache✓ Link68.9Grave et al. (2016) - LSTM + continuous cache pointer2016-12-13
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time✓ Link68.9EGRU2022-06-13
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling✓ Link87.091.5Inan et al. (2016) - Variational LSTM (tied) (h=650) + augmented loss2016-11-04
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling✓ Link87.792.3Inan et al. (2016) - Variational LSTM (tied) (h=650)2016-11-04
Improving Neural Language Models with a Continuous Cache✓ Link99.3Grave et al. (2016) - LSTM2016-12-13
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link234.77OPT-175B (50% Sparsity)2023-01-02