machine-translation-on-iwslt2014-german

Machine Translation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	BLEU score	Number of Params	ModelName	ReleaseDate
Integrating Pre-trained Language Model into Neural Machine Translation		40.43		PiNMT	2023-10-30
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation	✓ Link	38.61	73.8M	BiBERT	2021-09-09
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation	✓ Link	38.37		Bi-SimCut	2022-06-06
Relaxed Attention for Transformer Models	✓ Link	37.96	24.1M	Cutoff + Relaxed Attention + LM	2022-09-20
Deterministic Reversible Data Augmentation for Neural Machine Translation	✓ Link	37.95		DRDA	2024-06-04
R-Drop: Regularized Dropout for Neural Networks	✓ Link	37.90		Transformer + R-Drop + Cutoff	2021-06-28
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation	✓ Link	37.81		SimCut	2022-06-06
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule	✓ Link	37.78		Cutoff+Knee	2020-03-09
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation	✓ Link	37.6		Cutoff	2020-09-29
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation	✓ Link	37.53		CipherDAug	2022-04-01
R-Drop: Regularized Dropout for Neural Networks	✓ Link	37.25		Transformer + R-Drop	2021-06-28
Data Diversification: A Simple Strategy For Neural Machine Translation	✓ Link	37.2		Data Diversification	2019-11-05
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost		36.88		UniDrop	2021-04-11
Sequence Generation with Mixed Representations	✓ Link	36.41		MixedRepresentations	2020-07-11
Mask Attention Networks: Rethinking and Strengthen Transformer	✓ Link	36.3	37M	Mask Attention Network (small)	2021-03-25
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning	✓ Link	36.3		MUSE(Parallel Multi-scale Attention)	2019-11-17
Rethinking Perturbations in Encoder-Decoders for Fast Training	✓ Link	36.22	37M	Transformer+Rep(Sim)+WDrop	2021-04-05
Multi-branch Attentive Transformer	✓ Link	36.22		MAT	2020-06-18
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks	✓ Link	35.8		TransformerBase + AutoDropout	2021-01-05
Joint Source-Target Self Attention with Locality Constraints	✓ Link	35.7		Local Joint Self-attention	2019-05-16
Time-aware Large Kernel Convolutions	✓ Link	35.5		TaLK Convolutions	2020-02-08
Autoregressive Knowledge Distillation through Imitation Learning	✓ Link	35.4		ImitKD + Full	2020-09-15
DeLighT: Deep and Light-weight Transformer	✓ Link	35.3		DeLighT	2020-08-03
Pay Less Attention with Lightweight and Dynamic Convolutions	✓ Link	35.2		DynamicConv	2019-01-29
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks		35.1385		Transformer	2022-05-15
Pay Less Attention with Lightweight and Dynamic Convolutions	✓ Link	34.8		LightConv	2019-01-29
Attention Is All You Need	✓ Link	34.44		Transformer	2017-06-12
Random Feature Attention		34.4		Rfa-Gate-arccos	2021-03-03
Latent Alignment and Variational Attention	✓ Link	33.1		Variational Attention	2018-07-10
Classical Structured Prediction Losses for Sequence to Sequence Learning	✓ Link	32.84		Minimum Risk Training [Edunov2017]	2017-11-14
Non-Autoregressive Translation by Learning Target Categorical Codes	✓ Link	31.15		CNAT	2021-03-21
Towards Neural Phrase-based Machine Translation	✓ Link	30.08		Neural PBMT + LM [Huang2018]	2017-06-17
Tag-less Back-Translation		28.83		Back-Translation Finetuning	2019-12-22
An Actor-Critic Algorithm for Sequence Prediction	✓ Link	28.53		Actor-Critic [Bahdanau2017]	2016-07-24

OpenCodePapers

machine-translation-on-iwslt2014-german