LlamBERT: Large-scale low-cost data annotation in NLP | ✓ Link | 96.68 | RoBERTa-large with LlamBERT | 2024-03-23 |
LlamBERT: Large-scale low-cost data annotation in NLP | ✓ Link | 96.54 | RoBERTa-large | 2024-03-23 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 96.21 | XLNet | 2019-06-19 |
An Algorithm for Routing Vectors in Sequences | ✓ Link | 96.2 | Heinsen Routing + RoBERTa Large | 2022-11-20 |
Entailment as Few-Shot Learner | ✓ Link | 96.1 | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
Graph Star Net for Generalized Multi-Task Learning | ✓ Link | 96.0 | GraphStar | 2019-06-21 |
The Document Vectors Using Cosine Similarity Revisited | ✓ Link | 95.94 | DV-ngrams-cosine with NB sub-sampling + RoBERTa.base | 2022-05-26 |
The Document Vectors Using Cosine Similarity Revisited | ✓ Link | 95.92 | DV-ngrams-cosine + RoBERTa.base | 2022-05-26 |
[]() | | 95.9 | Roberta_Large ST + Cosine Similarity Loss | |
Unsupervised Data Augmentation for Consistency Training | ✓ Link | 95.8 | BERT large finetune UDA | 2019-04-29 |
How to Fine-Tune BERT for Text Classification? | ✓ Link | 95.79 | BERT_large+ITPT | 2019-05-14 |
The Document Vectors Using Cosine Similarity Revisited | ✓ Link | 95.79 | RoBERTa.base | 2022-05-26 |
Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function | ✓ Link | 95.68 | L MIXED | 2020-09-08 |
How to Fine-Tune BERT for Text Classification? | ✓ Link | 95.63 | BERT_base+ITPT | 2019-05-14 |
Unsupervised Data Augmentation for Consistency Training | ✓ Link | 95.49 | BERT large | 2019-04-29 |
Universal Language Model Fine-tuning for Text Classification | ✓ Link | 95.4 | ULMFiT | 2018-01-18 |
LlamBERT: Large-scale low-cost data annotation in NLP | ✓ Link | 95.39 | Llama-2-70b-chat (0-shot) | 2024-03-23 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 95 | FLAN 137B (few-shot, k=2) | 2021-09-03 |
GPU Kernels for Block-Sparse Weights | ✓ Link | 94.99 | Block-sparse LSTM | 2017-12-01 |
Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMs | ✓ Link | 94.88 | Space-XLNet | 2024-01-30 |
Contextual Explanation Networks | ✓ Link | 94.52 | CEN-tpc | 2017-05-29 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 94.3 | FLAN 137B (zero-shot) | 2021-09-03 |
Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings | | 94.1 | oh-LSTM | 2016-02-07 |
Adversarial Training Methods for Semi-Supervised Text Classification | ✓ Link | 94.1 | Virtual adversarial training | 2016-05-25 |
The Document Vectors Using Cosine Similarity Revisited | ✓ Link | 93.68 | DV-ngrams-cosine + NB-weighted BON (re-evaluated) | 2022-05-26 |
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention | ✓ Link | 93.2 | Nyströmformer | 2021-02-07 |
Parallelizing Legendre Memory Unit Training | ✓ Link | 93.20 | Modified LMU (34M) | 2021-02-22 |
Sentiment Classification Using Document Embeddings Trained with Cosine Similarity | ✓ Link | 93.13 | DV-ngrams-cosine | 2019-07-01 |
Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models | ✓ Link | 93.06 | OCaTS (kNN & GPT-3.5-turbo | 2023-10-20 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 92.82 | DistilBERT 66M | 2019-10-02 |
Language Models are Unsupervised Multitask Learners | ✓ Link | 92.36 | GPT-2 Finetuned | 2019-02-14 |
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks | ✓ Link | 92.33 | seq2-bown-CNN | 2014-12-01 |
BP-Transformer: Modelling Long-Range Context via Binary Partitioning | ✓ Link | 92.12 | BP-Transformer + GloVe | 2019-11-11 |
Learned in Translation: Contextualized Word Vectors | ✓ Link | 91.8 | BCN+Char+CoVe | 2017-08-01 |
Task-oriented Word Embedding for Text Classification | ✓ Link | 90.8 | ToWE-SG | 2018-08-01 |
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach | ✓ Link | 90.54 | COSINE | 2020-10-15 |
Long Short-Term Memory with Dynamic Skip Connections | ✓ Link | 90.1 | LSTM with dynamic skip | 2018-11-09 |
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis | ✓ Link | 88.9 | CNN+LSTM | 2017-07-06 |
UnICORNN: A recurrent model for learning very long time dependencies | ✓ Link | 88.4 | UnICORNN | 2021-03-09 |
Closed-form Continuous-time Neural Models | ✓ Link | 88.4 | CfC | 2021-06-25 |
Efficient Vector Representation for Documents through Corruption | ✓ Link | 88.3 | Doc2VecC | 2017-07-08 |
Learning in Wilson-Cowan model for metapopulation | ✓ Link | 87.46 | Bert+ Wilson-Cowan model RNN | 2024-06-24 |
Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies | ✓ Link | 87.4% | coRNN | 2020-10-02 |
Sentence-State LSTM for Text Representation | ✓ Link | 87.15 | S-LSTM | 2018-05-07 |
Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations | ✓ Link | 87 | AlexNet [alexnet] | 2021-06-23 |
Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations | ✓ Link | 86 | VGG16 [vgg16] | 2021-06-23 |
Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations | ✓ Link | 85 | ResNext[resnext] | 2021-06-23 |
Information Aggregation via Dynamic Routing for Sequence Encoding | ✓ Link | 45.1 | Standard DR-AGG | 2018-06-05 |
Information Aggregation via Dynamic Routing for Sequence Encoding | ✓ Link | 44.5 | Reverse DR-AGG | 2018-06-05 |