machine-translation-on-wmt2014-english-german

Machine Translation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	BLEU score	SacreBLEU	Number of Params	Hardware Burden	Operations per network pass	ModelName	ReleaseDate
Lessons on Parameter Sharing across Layers in Transformers	✓ Link	35.14	33.54				Transformer Cycle (Rev)	2021-04-13
Understanding Back-Translation at Scale	✓ Link	35.0	33.8		146G		Noisy back-translation	2018-08-28
Rethinking Perturbations in Encoder-Decoders for Fast Training	✓ Link	33.89	32.35				Transformer+Rep(Uni)	2021-04-05
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	32.1		11110M			T5-11B	2019-10-23
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation	✓ Link	31.26					BiBERT	2021-09-09
R-Drop: Regularized Dropout for Neural Networks	✓ Link	30.91			49G		Transformer + R-Drop	2021-06-28
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation	✓ Link	30.78					Bi-SimCut	2022-06-06
Incorporating BERT into Neural Machine Translation	✓ Link	30.75					BERT-fused NMT	2020-02-17
Data Diversification: A Simple Strategy For Neural Machine Translation	✓ Link	30.7					Data Diversification - Transformer	2019-11-05
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation	✓ Link	30.56					SimCut	2022-06-06
Mask Attention Networks: Rethinking and Strengthen Transformer	✓ Link	30.4		215M			Mask Attention Network (big)	2021-03-25
Very Deep Transformers for Neural Machine Translation	✓ Link	30.1	29.5	256M			Transformer (ADMIN init)	2020-08-18
PowerNorm: Rethinking Batch Normalization in Transformers	✓ Link	30.1					PowerNorm (Transformer)	2020-03-17
Depth Growing for Neural Machine Translation	✓ Link	30.07			24G		Depth Growing	2019-07-03
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning	✓ Link	29.9					MUSE(Parallel Multi-scale Attention)	2019-11-17
The Evolved Transformer	✓ Link	29.8	29.2	218M			Evolved Transformer Big	2019-01-30
OmniNet: Omnidirectional Representations from Transformers	✓ Link	29.8					OmniNetP	2021-03-01
Pay Less Attention with Lightweight and Dynamic Convolutions	✓ Link	29.7		213M			DynamicConv	2019-01-29
Joint Source-Target Self Attention with Locality Constraints	✓ Link	29.7					Local Joint Self-attention	2019-05-16
Time-aware Large Kernel Convolutions	✓ Link	29.6		209M			TaLK Convolutions	2020-02-08
Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation	✓ Link	29.6					Transformer Big + MoS	2018-09-25
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation		29.57					AdvAug (aut+adv)	2020-06-21
PartialFormer: Modeling Part Instead of Whole for Machine Translation	✓ Link	29.56		68M			PartialFormer	2023-10-23
Improving Neural Language Modeling via Adversarial Training	✓ Link	29.52					Transformer Big + adversarial MLE	2019-06-10
Scaling Neural Machine Translation	✓ Link	29.3		210M	9G		Transformer Big	2018-06-01
Subformer: A Parameter Reduced Transformer		29.3					Subformer-xlarge	2021-01-01
Synchronous Bidirectional Neural Machine Translation	✓ Link	29.21					SB-NMT	2019-05-13
Self-Attention with Relative Position Representations	✓ Link	29.2					Transformer (big) + Relative Position Representations	2018-03-06
Learning to Encode Position for Transformer with Continuous Dynamical Model	✓ Link	29.2					FLOATER-large	2020-03-13
Modeling Localness for Self-Attention Networks		29.2					Local Transformer	2018-10-24
FRAGE: Frequency-Agnostic Word Representation	✓ Link	29.11					Transformer Big with FRAGE	2018-09-18
Mask Attention Networks: Rethinking and Strengthen Transformer	✓ Link	29.1		63M			Mask Attention Network (base)	2021-03-25
Mega: Moving Average Equipped Gated Attention	✓ Link	29.01	27.96	67M			Mega	2022-09-21
Neural Machine Translation with Adequacy-Oriented Learning		28.99					adequacy-oriented NMT	2018-11-21
Pay Less Attention with Lightweight and Dynamic Convolutions	✓ Link	28.9		202M			LightConv	2019-01-29
Weighted Transformer Network for Machine Translation	✓ Link	28.9					Weighted Transformer (large)	2017-11-06
Universal Transformers	✓ Link	28.9					universal transformer base	2018-07-10
KERMIT: Generative Insertion-Based Modeling for Sequences		28.7					KERMIT	2019-06-04
Finetuning Pretrained Transformers into RNNs	✓ Link	28.7					T2R + Pretrain	2021-03-24
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation		28.58					AdvAug (aut)	2020-06-21
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation	✓ Link	28.5			44G	2.81G	RNMT+	2018-04-26
Synthesizer: Rethinking Self-Attention in Transformer Models	✓ Link	28.47					Synthesizer (Random + Vanilla)	2020-05-02
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing	✓ Link	28.4		48M			Hardware Aware Transformer	2020-05-28
Attention Is All You Need	✓ Link	28.4			871G	2300000000.0G	Transformer Big	2017-06-12
Simple Recurrent Units for Highly Parallelizable Recurrence	✓ Link	28.4			34G		Transformer + SRU	2017-09-08
The Evolved Transformer	✓ Link	28.4			2488G		Evolved Transformer Base	2019-01-30
Random Feature Attention		28.2					Rfa-Gate-arccos	2021-03-03
Deep Residual Output Layers for Neural Language Generation	✓ Link	28.1					Transformer-DRILL Base	2019-05-14
AdvAug: Robust Adversarial Augmentation for Neural Machine Translation		28.08					AdvAug (mixup)	2020-06-21
Incorporating a Local Translation Mechanism into Non-autoregressive Translation	✓ Link	27.35					CMLM+LAT+4 iterations	2020-11-12
Attention Is All You Need	✓ Link	27.3				330000000.0G	Transformer Base	2017-06-12
Levenshtein Transformer	✓ Link	27.27					Levenshtein Transformer (distillation)	2019-05-27
Non-autoregressive Translation with Disentangled Context Transformer	✓ Link	27.06					DisCo + Mask-Predict (non-autoregressive)
Adaptively Sparse Transformers	✓ Link	26.93					Adaptively Sparse Transformer (alpha-entmax)	2019-08-30
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	26.8					ResMLP-12	2021-05-07
Non-Autoregressive Translation by Learning Target Categorical Codes	✓ Link	26.6					CNAT	2021-03-21
Lite Transformer with Long-Short Range Attention	✓ Link	26.5		17.3M			Lite Transformer	2020-04-24
Convolutional Sequence to Sequence Learning	✓ Link	26.4			54G		ConvS2S (ensemble)	2017-05-08
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	26.4					ResMLP-6	2021-05-07
Accelerating Neural Transformer via an Average Attention Network	✓ Link	26.31					Average Attention Network	2018-05-02
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation	✓ Link	26.3					GNMT+RL	2016-09-26
Depthwise Separable Convolutions for Neural Machine Translation	✓ Link	26.1					SliceNet	2017-06-09
Accelerating Neural Transformer via an Average Attention Network	✓ Link	26.05					Average Attention Network (w/o FFN)	2018-05-02
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	✓ Link	26.03			24G		MoE	2017-01-23
Accelerating Neural Transformer via an Average Attention Network	✓ Link	25.91					Average Attention Network (w/o gate)	2018-05-02
Adaptively Sparse Transformers	✓ Link	25.89					Adaptively Sparse Transformer (1.5-entmax)	2019-08-30
Dense Information Flow for Neural Machine Translation	✓ Link	25.52					DenseNMT	2018-06-03
Glancing Transformer for Non-Autoregressive Neural Machine Translation	✓ Link	25.21					GLAT	2020-08-18
Incorporating a Local Translation Mechanism into Non-autoregressive Translation	✓ Link	25.20					CMLM+LAT+1 iterations	2020-11-12
Convolutional Sequence to Sequence Learning	✓ Link	25.16			72G		ConvS2S	2017-05-08
Neural Machine Translation in Linear Time	✓ Link	23.75					ByteNet	2016-10-31
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	✓ Link	23.64					FlowSeq-large (NPD n = 30)	2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	✓ Link	23.14					FlowSeq-large (NPD n = 15)	2019-09-05
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	✓ Link	22.94					FlowSeq-large (IWD n = 15)	2019-09-05
Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement	✓ Link	21.54					Denoising autoencoders (non-autoregressive)	2018-02-19
Effective Approaches to Attention-based Neural Machine Translation	✓ Link	20.9					RNN Enc-Dec Att	2015-08-17
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	✓ Link	20.85					FlowSeq-large	2019-09-05
[]()		20.7					PBMT
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation	✓ Link	20.7			119G		Deep-Att	2016-06-14
Edinburgh's Syntax-Based Systems at WMT 2015		20.7					Phrase Based MT	2015-09-01
Phrase-Based & Neural Unsupervised Machine Translation	✓ Link	20.23					PBSMT + NMT	2018-04-20
Non-Autoregressive Neural Machine Translation	✓ Link	19.17					NAT +FT + NPD	2017-11-07
FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow	✓ Link	18.55					FlowSeq-base	2019-09-05
Sequence-Level Knowledge Distillation	✓ Link	18.5					Seq-KD + Seq-Inter + Word-KD	2016-06-25
Phrase-Based & Neural Unsupervised Machine Translation	✓ Link	17.94					Unsupervised PBSMT	2018-04-20
Neural Semantic Encoders	✓ Link	17.9					NSE-NSE	2016-07-14
Phrase-Based & Neural Unsupervised Machine Translation	✓ Link	17.16					Unsupervised NMT + Transformer	2018-04-20
Unsupervised Statistical Machine Translation	✓ Link	14.08					SMT + iterative backtranslation (unsupervised)	2018-09-04
Effective Approaches to Attention-based Neural Machine Translation	✓ Link	14.0					Reverse RNN Enc-Dec	2015-08-17
Effective Approaches to Attention-based Neural Machine Translation	✓ Link	11.3					RNN Enc-Dec	2015-08-17
Multi-branch Attentive Transformer	✓ Link		29.9				MAT	2020-06-18

OpenCodePapers

machine-translation-on-wmt2014-english-german