OpenCodePapers

machine-translation-on-iwslt2014-german

Machine Translation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBLEU scoreNumber of ParamsModelNameReleaseDate
Integrating Pre-trained Language Model into Neural Machine Translation40.43PiNMT2023-10-30
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation✓ Link38.6173.8MBiBERT2021-09-09
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation✓ Link38.37Bi-SimCut2022-06-06
Relaxed Attention for Transformer Models✓ Link37.9624.1MCutoff + Relaxed Attention + LM2022-09-20
Deterministic Reversible Data Augmentation for Neural Machine Translation✓ Link37.95DRDA2024-06-04
R-Drop: Regularized Dropout for Neural Networks✓ Link37.90Transformer + R-Drop + Cutoff2021-06-28
Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation✓ Link37.81SimCut2022-06-06
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule✓ Link37.78Cutoff+Knee2020-03-09
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation✓ Link37.6Cutoff2020-09-29
CipherDAug: Ciphertext based Data Augmentation for Neural Machine Translation✓ Link37.53CipherDAug2022-04-01
R-Drop: Regularized Dropout for Neural Networks✓ Link37.25Transformer + R-Drop2021-06-28
Data Diversification: A Simple Strategy For Neural Machine Translation✓ Link37.2Data Diversification2019-11-05
UniDrop: A Simple yet Effective Technique to Improve Transformer without Extra Cost36.88UniDrop2021-04-11
Sequence Generation with Mixed Representations✓ Link36.41MixedRepresentations2020-07-11
Mask Attention Networks: Rethinking and Strengthen Transformer✓ Link36.337MMask Attention Network (small)2021-03-25
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning✓ Link36.3MUSE(Parallel Multi-scale Attention)2019-11-17
Rethinking Perturbations in Encoder-Decoders for Fast Training✓ Link36.2237MTransformer+Rep(Sim)+WDrop2021-04-05
Multi-branch Attentive Transformer✓ Link36.22MAT2020-06-18
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks✓ Link35.8TransformerBase + AutoDropout2021-01-05
Joint Source-Target Self Attention with Locality Constraints✓ Link35.7Local Joint Self-attention2019-05-16
Time-aware Large Kernel Convolutions✓ Link35.5TaLK Convolutions2020-02-08
Autoregressive Knowledge Distillation through Imitation Learning✓ Link35.4ImitKD + Full2020-09-15
DeLighT: Deep and Light-weight Transformer✓ Link35.3DeLighT2020-08-03
Pay Less Attention with Lightweight and Dynamic Convolutions✓ Link35.2DynamicConv2019-01-29
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks35.1385Transformer2022-05-15
Pay Less Attention with Lightweight and Dynamic Convolutions✓ Link34.8LightConv2019-01-29
Attention Is All You Need✓ Link34.44Transformer2017-06-12
Random Feature Attention34.4Rfa-Gate-arccos2021-03-03
Latent Alignment and Variational Attention✓ Link33.1Variational Attention2018-07-10
Classical Structured Prediction Losses for Sequence to Sequence Learning✓ Link32.84Minimum Risk Training [Edunov2017]2017-11-14
Non-Autoregressive Translation by Learning Target Categorical Codes✓ Link31.15CNAT2021-03-21
Towards Neural Phrase-based Machine Translation✓ Link30.08Neural PBMT + LM [Huang2018]2017-06-17
Tag-less Back-Translation28.83Back-Translation Finetuning2019-12-22
An Actor-Critic Algorithm for Sequence Prediction✓ Link28.53Actor-Critic [Bahdanau2017]2016-07-24