Acceptability Judgements via Examining the Topology of Attention Maps | ✓ Link | 88.6% | | En-BERT + TDA + PCA | 2022-05-19 |
Can BERT eat RuCoLA? Topological Data Analysis to Explain | ✓ Link | 88.2% | 0.726 | BERT+TDA | 2023-04-04 |
Can BERT eat RuCoLA? Topological Data Analysis to Explain | ✓ Link | 87.3% | 0.695 | RoBERTa+TDA | 2023-04-04 |
tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation | ✓ Link | 87.15% | | deberta-v3-base+tasksource | 2023-01-14 |
Entailment as Few-Shot Learner | ✓ Link | 86.4% | | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
Not all layers are equally as important: Every Layer Counts BERT | | 82.7 | | LTG-BERT-base 98M | 2023-11-03 |
Not all layers are equally as important: Every Layer Counts BERT | | 82.6 | | ELC-BERT-base 98M | 2023-11-03 |
Acceptability Judgements via Examining the Topology of Attention Maps | ✓ Link | 82.1% | 0.565 | En-BERT + TDA | 2022-05-19 |
FNet: Mixing Tokens with Fourier Transforms | ✓ Link | 78% | | FNet-Large | 2021-05-09 |
Not all layers are equally as important: Every Layer Counts BERT | | 77.6 | | LTG-BERT-small 24M | 2023-11-03 |
Not all layers are equally as important: Every Layer Counts BERT | | 76.1 | | ELC-BERT-small 24M | 2023-11-03 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 70.8% | | T5-11B | 2019-10-23 |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | | 69.2% | | StructBERTRoBERTa ensemble | 2019-08-13 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | ✓ Link | 69.1% | | ALBERT | 2019-09-26 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 69% | | XLNet (single model) | 2019-06-19 |
Learning to Encode Position for Transformer with Continuous Dynamical Model | ✓ Link | 69% | | FLOATER-large | 2020-03-13 |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | ✓ Link | 68.6% | | RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned) | 2022-08-15 |
Multi-Task Deep Neural Networks for Natural Language Understanding | ✓ Link | 68.4% | | MT-DNN | 2019-01-31 |
[]() | | 68.2% | | ELECTRA | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 67.8% | | RoBERTa (ensemble) | 2019-07-26 |
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | ✓ Link | 67.5 | | PSQ (Chen et al., 2020) | 2020-10-27 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 67.1% | | T5-XL 3B | 2019-10-23 |
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT | | 65.1 | | Q-BERT (Shen et al., 2020) | 2019-09-12 |
Q8BERT: Quantized 8Bit BERT | ✓ Link | 65.0 | | Q8BERT (Zafrir et al., 2019) | 2019-10-14 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans | ✓ Link | 64.3% | | SpanBERT | 2019-07-24 |
CLEAR: Contrastive Learning for Sentence Representation | | 64.3% | | MLM+ del-span+ reorder | 2020-12-31 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 63.5% | | ERNIE 2.0 Large | 2019-07-29 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 61.2% | | T5-Large 770M | 2019-10-23 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 60.5% | | BERT-LARGE | 2018-10-11 |
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | ✓ Link | 60.3% | | data2vec | 2022-02-07 |
RealFormer: Transformer Likes Residual Attention | ✓ Link | 59.83% | | RealFormer | 2020-12-21 |
Big Bird: Transformers for Longer Sequences | ✓ Link | 58.5% | | BigBird | 2020-07-28 |
How to Train BERT with an Academic Budget | ✓ Link | 57.1 | | 24hBERT | 2021-04-15 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 55.2% | | ERNIE 2.0 Base | 2019-07-29 |
ERNIE: Enhanced Language Representation with Informative Entities | ✓ Link | 52.3% | | ERNIE | 2019-05-17 |
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | ✓ Link | 51.8% | | Charformer-Tall | 2021-06-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 51.1% | | T5-Base | 2019-10-23 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 49.1% | | DistilBERT 66M | 2019-10-02 |
SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | ✓ Link | 46.5% | | SqueezeBERT | 2020-06-19 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 43.3% | | TinyBERT-4 14.5M | 2019-09-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 41.0% | | T5-Small | 2019-10-23 |
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | ✓ Link | 14.1% | | LM-CPPF RoBERTa-base | 2023-05-29 |
RuCoLA: Russian Corpus of Linguistic Acceptability | ✓ Link | | 0.6 | RemBERT | 2022-10-23 |