Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 90.06 | 95.64 | T5-11B | 2019-10-23 |
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention | ✓ Link | 89.8 | | LUKE | 2020-10-02 |
Dice Loss for Data-imbalanced NLP Tasks | ✓ Link | 89.79 | 95.77 | XLNet+DSC | 2019-11-07 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 89.7 | 95.1 | XLNet (single model) | 2019-06-19 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 88.53 | 94.95 | T5-3B | 2019-10-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 86.66 | 93.79 | T5-Large 770M | 2019-10-23 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 86.2 | 92.2 | BERT-LARGE (Ensemble+TriviaQA) | 2018-10-11 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 85.44 | 92.08 | T5-Base | 2019-10-23 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 84.2 | 91.1 | BERT-LARGE (Single+TriviaQA) | 2018-10-11 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 83.35 | 90.2 | BERT-Large-uncased-PruneOFA (90% unstruct sparse) | 2021-11-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 83.22 | 90.02 | BERT-Large-uncased-PruneOFA (90% unstruct sparse, QAT Int8) | 2021-11-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 81.1 | 88.42 | BERT-Base-uncased-PruneOFA (85% unstruct sparse) | 2021-11-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 80.84 | 88.24 | BERT-Base-uncased-PruneOFA (85% unstruct sparse, QAT Int8) | 2021-11-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 79.83 | 87.25 | BERT-Base-uncased-PruneOFA (90% unstruct sparse) | 2021-11-10 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 79.7 | 87.5 | TinyBERT-6 67M | 2019-09-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 79.1 | 87.24 | T5-Small | 2019-10-23 |
Reinforced Mnemonic Reader for Machine Reading Comprehension | ✓ Link | 78.9 | 86.3 | R.M-Reader (single) | 2017-05-08 |
Learning Dense Representations of Phrases at Scale | ✓ Link | 78.3 | 86.3 | DensePhrases | 2020-12-23 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 78.1 | 85.82 | DistilBERT-uncased-PruneOFA (85% unstruct sparse) | 2021-11-10 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 77.7 | | DistilBERT | 2019-10-02 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 77.03 | 85.13 | DistilBERT-uncased-PruneOFA (85% unstruct sparse, QAT Int8) | 2021-11-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 76.91 | 84.82 | DistilBERT-uncased-PruneOFA (90% unstruct sparse) | 2021-11-10 |
Explicit Utilization of General Knowledge in Machine Reading Comprehension | | 76.7 | 84.9 | KAR | 2018-09-10 |
Stochastic Answer Networks for Machine Reading Comprehension | ✓ Link | 76.235 | 84.056 | SAN (single) | 2017-12-10 |
Prune Once for All: Sparse Pre-Trained Language Models | ✓ Link | 75.62 | 83.87 | DistilBERT-uncased-PruneOFA (90% unstruct sparse, QAT Int8) | 2021-11-10 |
FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension | ✓ Link | 75.3 | 83.6 | FusionNet | 2017-11-16 |
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension | ✓ Link | 75.1 | 83.8 | QANet (data aug x3) | 2018-04-23 |
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension | ✓ Link | 74.5 | 83.2 | QANet (data aug x2) | 2018-04-23 |
DCN+: Mixed Objective and Deep Residual Coattention for Question Answering | ✓ Link | 74.5 | 83.1 | DCN+ (single) | 2017-10-31 |
QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension | ✓ Link | 73.6 | 82.7 | QANet | 2018-04-23 |
Phase Conductor on Multi-layered Attentions for Machine Comprehension | | 72.1 | 81.4 | PhaseCond (single) | 2017-10-28 |
Simple Recurrent Units for Highly Parallelizable Recurrence | ✓ Link | 71.4 | 80.2 | SRU | 2017-09-08 |
Smarnet: Teaching Machines to Read and Comprehend Like Human | | 71.362 | 80.183 | Smarnet | 2017-10-08 |
Learned in Translation: Contextualized Word Vectors | ✓ Link | 71.3 | 79.9 | DCN (Char + CoVe) | 2017-08-01 |
Gated Self-Matching Networks for Reading Comprehension and Question Answering | | 71.1 | 79.5 | R-NET (single) | 2017-07-01 |
Ruminating Reader: Reasoning with Gated Multi-Hop Attention | | 70.6 | 79.5 | Ruminating Reader | 2017-04-24 |
Making Neural QA as Simple as Possible but not Simpler | ✓ Link | 70.3 | 78.5 | FastQAExt (beam-size 5) | 2017-03-14 |
Reading Wikipedia to Answer Open-Domain Questions | ✓ Link | 69.5 | 78.8 | DrQA (Document Reader only) | 2017-03-31 |
Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering | | 69.10 | 78.38 | jNet (TreeLSTM adaptation, QTLa, K=100) | 2017-03-14 |
Structural Embedding of Syntactic Trees for Machine Comprehension | | 67.89 | 77.42 | SEDT-LSTM | 2017-03-02 |
Bidirectional Attention Flow for Machine Comprehension | ✓ Link | 67.7 | 77.3 | BIDAF (single) | 2016-11-05 |
Structural Embedding of Syntactic Trees for Machine Comprehension | | 67.65 | 77.19 | SECT-LSTM | 2017-03-02 |
Learning Recurrent Span Representations for Extractive Question Answering | ✓ Link | 66.4 | 74.9 | RASOR | 2016-11-04 |
Multi-Perspective Context Matching for Machine Comprehension | ✓ Link | 66.1 | 75.8 | MPCM | 2016-12-13 |
Dynamic Coattention Networks For Question Answering | ✓ Link | 65.4 | 75.6 | DCN | 2016-11-05 |
A Fully Attention-Based Information Retriever | ✓ Link | 65.1 | 75.6 | FABIR | 2018-10-22 |
Machine Comprehension Using Match-LSTM and Answer Pointer | ✓ Link | 64.1 | 64.7 | Match-LSTM with Bi-Ans-Ptr (Boundary+Search+b) | 2016-08-29 |
Learning to Compute Word Embeddings On the Fly | | 63.06 | | OTF dict+spelling (single) | 2017-06-01 |
End-to-End Answer Chunk Extraction and Ranking for Reading Comprehension | | 62.5 | 71.2 | DCR | 2016-10-31 |
Words or Characters? Fine-grained Gating for Reading Comprehension | ✓ Link | 59.95 | 71.25 | FG fine-grained gate | 2016-11-06 |
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention | ✓ Link | | 95 | LUKE 483M | 2020-10-02 |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | ✓ Link | | 90.8 | BART Base (with text infilling) | 2019-10-29 |
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes | ✓ Link | | 90.584 | BERT large (LAMB optimizer) | 2019-04-01 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | | 85.8 | DistilBERT 66M | 2019-10-02 |
Deep contextualized word representations | ✓ Link | | 85.6 | BiDAF + Self Attention + ELMo | 2018-02-15 |