| Paper | Code | Test | Dev | ModelName | ReleaseDate |
|---|---|---|---|---|---|
| DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 90.8 | DeBERTalarge | 2020-06-05 | |
| RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 89.9 | RoBERTa | 2019-07-26 | |
| BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 86.3 | 86.6 | BERT-LARGE | 2018-10-11 |
| SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | 59.2 | 59.1 | ESIM + ELMo | 2018-08-16 | |
| SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | 52.7 | 51.9 | ESIM + GloVe | 2018-08-16 |