OpenCodePapers

language-modelling-on-penn-treebank-word

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTest perplexityValidation perplexityParamsModelNameReleaseDate
Language Models are Few-Shot Learners✓ Link20.5175000MGPT-3 (Zero-Shot)2020-05-28
Language Models with Transformers✓ Link31.336.1395MBERT-Large-CAS2019-04-20
Language Models are Unsupervised Multitask Learners✓ Link35.761542MGPT-22019-02-14
Mogrifier LSTM✓ Link44.944.824MMogrifier LSTM + dynamic eval2019-09-04
Improving Neural Language Modeling via Adversarial Training✓ Link46.0146.6322Madversarial + AWD-LSTM-MoS + dynamic eval2019-06-10
Gradual Learning of Recurrent Neural Networks✓ Link46.3446.6426MGL-LWGC + AWD-MoS-LSTM + dynamic eval2017-08-29
FRAGE: Frequency-Agnostic Word Representation✓ Link46.5447.3822MFRAGE + AWD-LSTM-MoS + dynamic eval2018-09-18
Direct Output Connection for a High-Rank Language Model✓ Link47.1748.63185MAWD-LSTM-DOC x52018-08-30
Improved Language Modeling by Decoding the Past47.348.022MPast Decode Reg. + AWD-LSTM-MoS + dyn. eval.2018-08-14
Advancing State of the Art in Language Modeling✓ Link47.3148.92Ensemble of All2023-11-28
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model✓ Link47.6948.3322MAWD-LSTM-MoS + dynamic eval2017-11-10
Deep Residual Output Layers for Neural Language Generation✓ Link49.449.524MAWD-LSTM-DRILL + dynamic eval2019-05-14
Deep Independently Recurrent Neural Network (IndRNN)✓ Link50.97Dense IndRNN+dynamic eval2019-10-11
Dynamic Evaluation of Neural Sequence Models✓ Link51.151.624MAWD-LSTM + dynamic eval2017-09-21
Partially Shuffling the Training Data to Improve Language Models✓ Link52.053.7923MAWD-LSTM-DOC + Partial Shuffle2019-03-11
Direct Output Connection for a High-Rank Language Model✓ Link52.3854.1223MAWD-LSTM-DOC2018-08-30
Regularizing and Optimizing LSTM Language Models✓ Link52.853.924MAWD-LSTM + continuous cache pointer2017-08-07
Partially Shuffling the Training Data to Improve Language Models✓ Link53.9255.8922MAWD-LSTM-MoS + Partial Shuffle2019-03-11
Trellis Networks for Sequence Modeling✓ Link54.19Trellis Network2018-10-15
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model✓ Link54.4456.5422MAWD-LSTM-MoS2017-11-10
Learning Associative Inference Using Fast Weight Memory✓ Link54.4856.7624MAWD-FWM Schlag et al. (2020)2020-11-16
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context✓ Link54.5556.7224MTransformer-XL2019-01-09
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks✓ Link54.958.1Transformer-XL + AutoDropout2021-01-05
Pushing the bounds of dropout✓ Link55.357.124M2-layer skip-LSTM + dropout tuning 2018-05-23
Deep Residual Output Layers for Neural Language Generation✓ Link55.758.224MAWD-LSTM-DRILL2019-05-14
DARTS: Differentiable Architecture Search✓ Link56.158.323MDifferentiable NAS2018-06-24
Deep Independently Recurrent Neural Network (IndRNN)✓ Link56.37Dense IndRNN2019-10-11
Fraternal Dropout✓ Link56.858.924MAWD-LSTM 3-layer with Fraternal dropout2017-10-31
Deep Equilibrium Models✓ Link57.124MDEQ-TrellisNet2019-09-03
Regularizing and Optimizing LSTM Language Models✓ Link57.360.024MAWD-LSTM2017-08-07
Efficient Neural Architecture Search via Parameter Sharing✓ Link 58.660.824MEfficient NAS2018-02-09
Neural Architecture Search with Reinforcement Learning✓ Link64.025MNAS-RL2016-11-05
Recurrent Highway Networks✓ Link65.467.923MRecurrent highway networks2016-07-12
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling✓ Link66.068.1Inan et al. (2016) - Variational RHN2016-11-04
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks✓ Link75.277.9Gal & Ghahramani (2016) - Variational LSTM (large)2015-12-16
Recurrent Neural Network Regularization✓ Link78.482.2Zaremba et al. (2014) - LSTM (large)2014-09-08
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling✓ Link78.93LSTM (Bai et al., 2018)2018-03-04
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks✓ Link79.781.9Gal & Ghahramani (2016) - Variational LSTM (medium)2015-12-16
Recurrent Neural Network Regularization✓ Link82.786.2Zaremba et al. (2014) - LSTM (medium)2014-09-08
R-Transformer: Recurrent Neural Network Enhanced Transformer✓ Link84.38R-Transformer2019-07-12
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling✓ Link92.48GRU (Bai et al., 2018)2018-03-04
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling✓ Link107.9514.9MSeq-U-Net2019-11-14
Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling✓ Link108.4714.7MTCN2019-11-14