SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 0.929 | 0.925 | | | | MT-DNN-SMART | 2019-11-08 |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | | 0.928 | 0.924 | | | | StructBERTRoBERTa ensemble | 2019-08-13 |
MNet-Sim: A Multi-layered Semantic Similarity Network to Evaluate Sentence Similarity | | 0.927 | 0.931 | | | | Mnet-Sim | 2021-11-09 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 0.925 | 0.921 | | | | T5-11B | 2019-10-23 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | ✓ Link | 0.925 | | | | | ALBERT | 2019-09-26 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 0.925 | | | | | XLNet (single model) | 2019-06-19 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 0.922 | | | | | RoBERTa | 2019-07-26 |
[]() | | 0.921 | | | | | ELECTRA | |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | ✓ Link | 0.919 | | | | | RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned) | 2022-08-15 |
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | ✓ Link | 0.919 | | | | | PSQ (Chen et al., 2020) | 2020-10-27 |
Entailment as Few-Shot Learner | ✓ Link | 0.918 | | | | | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 0.912 | | | | | ERNIE 2.0 Large | 2019-07-29 |
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT | | 0.911 | | | | | Q-BERT (Shen et al., 2020) | 2019-09-12 |
Q8BERT: Quantized 8Bit BERT | ✓ Link | 0.911 | | | | | Q8BERT (Zafrir et al., 2019) | 2019-10-14 |
[]() | | 0.910 | | | | | ELECTRA (no tricks) | |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 0.907 | | | | | DistilBERT 66M | 2019-10-02 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 0.906 | 0.898 | | | | T5-3B | 2019-10-23 |
CLEAR: Contrastive Learning for Sentence Representation | | 0.905 | | | | | MLM+ del-word | 2020-12-31 |
RealFormer: Transformer Likes Residual Attention | ✓ Link | 0.9011 | 0.8988 | | | | RealFormer | 2020-12-21 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 0.899 | | | | | T5-Large | 2019-10-23 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans | ✓ Link | 0.899 | | | | | SpanBERT | 2019-07-24 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 0.894 | | | | | T5-Base | 2019-10-23 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 0.876 | | | | | ERNIE 2.0 Base | 2019-07-29 |
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | ✓ Link | 0.873 | | | | | Charformer-Tall | 2021-06-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 0.856 | 0.85 | | | | T5-Small | 2019-10-23 |
ERNIE: Enhanced Language Representation with Informative Entities | ✓ Link | 0.832 | | | | | ERNIE | 2019-05-17 |
How to Train BERT with an Academic Budget | ✓ Link | 0.820 | | | | | 24hBERT | 2021-04-15 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 0.799 | | | | | TinyBERT-4 14.5M | 2019-09-23 |
Universal Sentence Encoder | ✓ Link | 0.782 | | | | | USE_T | 2018-03-29 |
AnglE-optimized Text Embeddings | ✓ Link | | 0.8969 | | | | AnglE-LLaMA-13B | 2023-09-22 |
Adversarial Self-Attention for Language Understanding | ✓ Link | | 0.892 | | | | ASA + RoBERTa | 2022-06-25 |
Scaling Sentence Embeddings with Large Language Models | ✓ Link | | 0.8914 | | | | PromptEOL+CSE+LLaMA-30B | 2023-07-31 |
AnglE-optimized Text Embeddings | ✓ Link | | 0.8897 | | | | AnglE-LLaMA-7B | 2023-09-22 |
AnglE-optimized Text Embeddings | ✓ Link | | 0.8897 | | | | AnglE-LLaMA-7B-v2 | 2023-09-22 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | | 0.886 | | | | T5-Large 770M | 2019-10-23 |
Scaling Sentence Embeddings with Large Language Models | ✓ Link | | 0.8856 | | | | PromptEOL+CSE+OPT-13B | 2023-07-31 |
Scaling Sentence Embeddings with Large Language Models | ✓ Link | | 0.8833 | | | | PromptEOL+CSE+OPT-2.7B | 2023-07-31 |
Improved Universal Sentence Embeddings with Prompt-based Contrastive Learning and Energy-based Learning | ✓ Link | | 0.8787 | | | | PromCSE-RoBERTa-large (0.355B) | 2022-03-14 |
Big Bird: Transformers for Longer Sequences | ✓ Link | | .878 | | | | BigBird | 2020-07-28 |
SimCSE: Simple Contrastive Learning of Sentence Embeddings | ✓ Link | | 0.867 | | | | SimCSE-RoBERTalarge | 2021-04-18 |
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | ✓ Link | | 0.867 | | | | Trans-Encoder-RoBERTa-large-cross (unsup.) | 2021-09-27 |
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | ✓ Link | | 0.8655 | | | | Trans-Encoder-RoBERTa-large-bi (unsup.) | 2021-09-27 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | | 0.865 | | | | BERT-LARGE | 2018-10-11 |
Adversarial Self-Attention for Language Understanding | ✓ Link | | 0.865 | | | | ASA + BERT-base | 2022-06-25 |
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | ✓ Link | | 0.8616 | | | | Trans-Encoder-BERT-large-bi (unsup.) | 2021-09-27 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.8615 | | | | SRoBERTa-NLI-STSb-large | 2019-08-27 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.8479 | | | | SBERT-STSb-base | 2019-08-27 |
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | ✓ Link | | 0.8465 | | | | Trans-Encoder-RoBERTa-base-cross (unsup.) | 2021-09-27 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.8445 | | | | SBERT-STSb-large | 2019-08-27 |
FNet: Mixing Tokens with Fourier Transforms | ✓ Link | | 0.84 | | | | FNet-Large | 2021-05-09 |
Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations | ✓ Link | | 0.839 | | | | Trans-Encoder-BERT-base-bi (unsup.) | 2021-09-27 |
[]() | | | 0.7981 | | | | Pearl | |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.79 | | | | SBERT-NLI-large | 2019-08-27 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders | ✓ Link | | 0.787 | | | | Mirror-RoBERTa-base (unsup.) | 2021-04-16 |
Generating Datasets with Pretrained Language Models | ✓ Link | | 0.7782 | | | | Dino (STSb/̄🦕) | 2021-04-15 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.7777 | | | | SRoBERTa-NLI-base | 2019-08-27 |
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks | ✓ Link | | 0.7703 | | | | SBERT-NLI-base | 2019-08-27 |
Generating Datasets with Pretrained Language Models | ✓ Link | | 0.7651 | | | | Dino (STS/̄🦕) | 2021-04-15 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders | ✓ Link | | 0.764 | | | | Mirror-BERT-base (unsup.) | 2021-04-16 |
On the Sentence Embeddings from Pre-trained Language Models | ✓ Link | | 0.7226 | | | | BERTlarge-flow (target) | 2020-11-02 |
An Unsupervised Sentence Embedding Method by Mutual Information Maximization | ✓ Link | | 0.6921 | | | | IS-BERT-NLI | 2020-09-25 |
Rematch: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity | ✓ Link | | 0.6652 | | | | Rematch | 2024-04-02 |
Def2Vec: Extensible Word Embeddings from Dictionary Definitions | ✓ Link | | 0.6372 | | | | Def2Vec | 2023-12-16 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | | | 92.5 | | | DeBERTa (large) | 2020-06-05 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 92.8 | 92.6 | SMARTRoBERTa | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 90.0 | 89.4 | SMART-BERT | 2019-11-08 |