Crosslingual Generalization through Multitask Finetuning | ✓ Link | 96.3 | BLOOMZ | 2022-11-03 |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | ✓ Link | 95.88 | Flipped-3B | 2022-10-06 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 94.7 | FLAN 137B (few-shot, k=10) | 2021-09-03 |
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning | ✓ Link | 94.5 | T0-3B (CoT fine-tuned) | 2023-05-23 |
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models | | 94.40 | KiC-770M | 2022-10-28 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 93.4 | FLAN 137B (zero-shot) | 2021-09-03 |
Improving Machine Reading Comprehension with General Reading Strategies | ✓ Link | 88.3 | Reading Strategies Model | 2018-10-31 |
Improving Language Understanding by Generative Pre-Training | ✓ Link | 86.5 | Finetuned Transformer LM | 2018-06-11 |
Exploring the Benefits of Training Expert Language Models over Instruction Tuning | ✓ Link | 86.33 | RoE-3B | 2023-02-07 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 79.82 | OPT-175B | 2023-01-02 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 78.87 | SparseGPT (175B, 50% Sparsity) | 2023-01-02 |
UNIMELB at SemEval-2016 Tasks 4A and 4B: An Ensemble of Neural Networks and a Word2Vec Based Model for Sentiment Classification | ✓ Link | 78.7 | Memory chains and semantic supervision | 2016-06-01 |
Story Comprehension for Predicting What Happens Next | | 77.6 | Hidden Coherence Model | 2017-09-01 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 77.02 | SparseGPT (175B, 4:8 Sparsity) | 2023-01-02 |
A Simple and Effective Approach to the Story Cloze Test | | 76.5 | val-LS-skip | 2018-03-15 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 76.19 | SparseGPT (175B, 2:4 Sparsity) | 2023-01-02 |
Efficient Language Modeling with Sparse all-MLP | | 74.7 | sMLP – deterministic 9.4B (0-shot) | 2022-03-14 |
Efficient Language Modeling with Sparse all-MLP | | 73.3 | Switch Transformer 9B | 2022-03-14 |
Language Models are Few-Shot Learners | ✓ Link | 72.4 | GPT-3 Large 760M (zero-shot) | 2020-05-28 |
Efficient Language Modeling with Sparse all-MLP | | 67.9 | Gshard 9B | 2022-03-14 |
Efficient Language Modeling with Sparse all-MLP | | 64.7 | HASH Layers 10B (0-shot) | 2022-03-14 |
Efficient Language Modeling with Sparse all-MLP | | 61.4 | Base Layers 10B (0-shot) | 2022-03-14 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 47.10 | OPT-175B (50% Sparsity) | 2023-01-02 |