OpenCodePapers

machine-translation-on-wmt2014-english-german

Machine Translation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBLEU scoreSacreBLEUNumber of ParamsHardware BurdenOperations per network passModelNameReleaseDate
Lessons on Parameter Sharing across Layers in Transformers✓ Link35.1433.54Transformer Cycle (Rev)2021-04-13
Understanding Back-Translation at Scale✓ Link35.033.8146GNoisy back-translation2018-08-28
Rethinking Perturbations in Encoder-Decoders for Fast Training✓ Link33.8932.35Transformer+Rep(Uni)2021-04-05
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link32.111110MT5-11B2019-10-23
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation✓ Link31.26BiBERT2021-09-09
R-Drop: Regularized Dropout for Neural Networks✓ Link30.9149GTransformer + R-Drop2021-06-28
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation✓ Link30.78Bi-SimCut2022-06-06
Incorporating BERT into Neural Machine Translation✓ Link30.75BERT-fused NMT2020-02-17
Data Diversification: A Simple Strategy For Neural Machine Translation✓ Link30.7Data Diversification - Transformer2019-11-05
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation✓ Link30.56SimCut2022-06-06
Mask Attention Networks: Rethinking and Strengthen Transformer✓ Link30.4215MMask Attention Network (big)2021-03-25
Very Deep Transformers for Neural Machine Translation✓ Link30.129.5256MTransformer (ADMIN init)2020-08-18
PowerNorm: Rethinking Batch Normalization in Transformers✓ Link30.1PowerNorm (Transformer)2020-03-17
Depth Growing for Neural Machine Translation✓ Link30.0724GDepth Growing2019-07-03
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning✓ Link29.9MUSE(Parallel Multi-scale Attention)2019-11-17
The Evolved Transformer✓ Link29.829.2218MEvolved Transformer Big2019-01-30
OmniNet: Omnidirectional Representations from Transformers✓ Link29.8OmniNetP2021-03-01
Pay Less Attention with Lightweight and Dynamic Convolutions✓ Link29.7213MDynamicConv2019-01-29
Joint Source-Target Self Attention with Locality Constraints✓ Link29.7Local Joint Self-attention2019-05-16
Time-aware Large Kernel Convolutions✓ Link29.6209MTaLK Convolutions2020-02-08
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation✓ Link29.6Transformer Big + MoS2018-09-25
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation29.57AdvAug (aut+adv)2020-06-21
PartialFormer: Modeling Part Instead of Whole for Machine Translation✓ Link29.5668MPartialFormer2023-10-23
Improving Neural Language Modeling via Adversarial Training✓ Link29.52Transformer Big + adversarial MLE2019-06-10
Scaling Neural Machine Translation✓ Link29.3210M9GTransformer Big2018-06-01
Subformer: A Parameter Reduced Transformer29.3Subformer-xlarge2021-01-01
Synchronous Bidirectional Neural Machine Translation✓ Link29.21SB-NMT2019-05-13
Self-Attention with Relative Position Representations✓ Link29.2Transformer (big) + Relative Position Representations2018-03-06
Learning to Encode Position for Transformer with Continuous Dynamical Model✓ Link29.2FLOATER-large2020-03-13
Modeling Localness for Self-Attention Networks29.2Local Transformer2018-10-24
FRAGE: Frequency-Agnostic Word Representation✓ Link29.11Transformer Big with FRAGE2018-09-18
Mask Attention Networks: Rethinking and Strengthen Transformer✓ Link29.163MMask Attention Network (base)2021-03-25
Mega: Moving Average Equipped Gated Attention✓ Link29.0127.9667MMega2022-09-21
Neural Machine Translation with Adequacy-Oriented Learning28.99adequacy-oriented NMT2018-11-21
Pay Less Attention with Lightweight and Dynamic Convolutions✓ Link28.9202MLightConv2019-01-29
Weighted Transformer Network for Machine Translation✓ Link28.9Weighted Transformer (large)2017-11-06
Universal Transformers✓ Link28.9universal transformer base2018-07-10
KERMIT: Generative Insertion-Based Modeling for Sequences28.7KERMIT2019-06-04
Finetuning Pretrained Transformers into RNNs✓ Link28.7T2R + Pretrain2021-03-24
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation28.58AdvAug (aut)2020-06-21
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation✓ Link28.544G2.81GRNMT+2018-04-26
Synthesizer: Rethinking Self-Attention in Transformer Models✓ Link28.47Synthesizer (Random + Vanilla)2020-05-02
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing✓ Link28.448MHardware Aware Transformer2020-05-28
Attention Is All You Need✓ Link28.4871G2300000000.0GTransformer Big2017-06-12
Simple Recurrent Units for Highly Parallelizable Recurrence✓ Link28.434GTransformer + SRU2017-09-08
The Evolved Transformer✓ Link28.42488GEvolved Transformer Base2019-01-30
Random Feature Attention28.2Rfa-Gate-arccos2021-03-03
Deep Residual Output Layers for Neural Language Generation✓ Link28.1Transformer-DRILL Base2019-05-14
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation28.08AdvAug (mixup)2020-06-21
Incorporating a Local Translation Mechanism into Non-autoregressive Translation✓ Link27.35CMLM+LAT+4 iterations2020-11-12
Attention Is All You Need✓ Link27.3330000000.0GTransformer Base2017-06-12
Levenshtein Transformer✓ Link27.27Levenshtein Transformer (distillation)2019-05-27
Non-autoregressive Translation with Disentangled Context Transformer✓ Link27.06DisCo + Mask-Predict (non-autoregressive)
Adaptively Sparse Transformers✓ Link26.93Adaptively Sparse Transformer (alpha-entmax)2019-08-30
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link26.8ResMLP-122021-05-07
Non-Autoregressive Translation by Learning Target Categorical Codes✓ Link26.6CNAT2021-03-21
Lite Transformer with Long-Short Range Attention✓ Link26.517.3MLite Transformer2020-04-24
Convolutional Sequence to Sequence Learning✓ Link26.454GConvS2S (ensemble)2017-05-08
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link26.4ResMLP-62021-05-07
Accelerating Neural Transformer via an Average Attention Network✓ Link26.31Average Attention Network2018-05-02
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation✓ Link26.3GNMT+RL2016-09-26
Depthwise Separable Convolutions for Neural Machine Translation✓ Link26.1SliceNet2017-06-09
Accelerating Neural Transformer via an Average Attention Network✓ Link26.05Average Attention Network (w/o FFN)2018-05-02
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer✓ Link26.0324GMoE2017-01-23
Accelerating Neural Transformer via an Average Attention Network✓ Link25.91Average Attention Network (w/o gate)2018-05-02
Adaptively Sparse Transformers✓ Link25.89Adaptively Sparse Transformer (1.5-entmax)2019-08-30
Dense Information Flow for Neural Machine Translation✓ Link25.52DenseNMT2018-06-03
Glancing Transformer for Non-Autoregressive Neural Machine Translation✓ Link25.21GLAT2020-08-18
Incorporating a Local Translation Mechanism into Non-autoregressive Translation✓ Link25.20CMLM+LAT+1 iterations2020-11-12
Convolutional Sequence to Sequence Learning✓ Link25.1672GConvS2S2017-05-08
Neural Machine Translation in Linear Time✓ Link23.75ByteNet2016-10-31
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow✓ Link23.64FlowSeq-large (NPD n = 30)2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow✓ Link23.14FlowSeq-large (NPD n = 15)2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow✓ Link22.94FlowSeq-large (IWD n = 15)2019-09-05
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement✓ Link21.54Denoising autoencoders (non-autoregressive)2018-02-19
Effective Approaches to Attention-based Neural Machine Translation✓ Link20.9RNN Enc-Dec Att2015-08-17
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow✓ Link20.85FlowSeq-large2019-09-05
[]()20.7PBMT
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation✓ Link20.7119GDeep-Att2016-06-14
Edinburgh's Syntax-Based Systems at WMT 201520.7Phrase Based MT2015-09-01
Phrase-Based & Neural Unsupervised Machine Translation✓ Link20.23PBSMT + NMT2018-04-20
Non-Autoregressive Neural Machine Translation✓ Link19.17NAT +FT + NPD2017-11-07
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow✓ Link18.55FlowSeq-base2019-09-05
Sequence-Level Knowledge Distillation✓ Link18.5Seq-KD + Seq-Inter + Word-KD2016-06-25
Phrase-Based & Neural Unsupervised Machine Translation✓ Link17.94Unsupervised PBSMT2018-04-20
Neural Semantic Encoders✓ Link17.9NSE-NSE2016-07-14
Phrase-Based & Neural Unsupervised Machine Translation✓ Link17.16Unsupervised NMT + Transformer2018-04-20
Unsupervised Statistical Machine Translation✓ Link14.08SMT + iterative backtranslation (unsupervised)2018-09-04
Effective Approaches to Attention-based Neural Machine Translation✓ Link14.0Reverse RNN Enc-Dec2015-08-17
Effective Approaches to Attention-based Neural Machine Translation✓ Link11.3RNN Enc-Dec2015-08-17
Multi-branch Attentive Transformer✓ Link29.9MAT2020-06-18