UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark | ✓ Link | 90.1 | Unicorn 11B (fine-tuned) | 2021-03-24 |
Mixture-of-Subspaces in Low-Rank Adaptation | ✓ Link | 89.7 | LLaMA3 8B+MoSLoRA | 2024-06-16 |
Task Compass: Scaling Multi-task Pre-training with Task Prefix | ✓ Link | 88.3 | CompassMTL 567M with Tailor | 2022-10-12 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 87.6 | LLaMA-3 8B + MixLoRA | 2024-04-22 |
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering | ✓ Link | 87.4 | DeBERTa-Large 304M | 2022-10-29 |
Task Compass: Scaling Multi-task Pre-training with Task Prefix | ✓ Link | 87.3 | CompassMTL 567M | 2022-10-12 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 86.8 | LLaMA-2 13B + MixLoRA | 2024-04-22 |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | | 86.2 | Shakti-LLM (2.5B) | 2024-10-15 |
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering | ✓ Link | 85.9 | DeBERTa-Large 304M (classification-based) | 2022-10-29 |
Task Compass: Scaling Multi-task Pre-training with Task Prefix | ✓ Link | 85.5 | ExDeBERTa 567M | 2022-10-12 |
UnifiedQA: Crossing Format Boundaries With a Single QA System | ✓ Link | 85.3 | UnifiedQA 3B | 2020-05-02 |
PaLM 2 Technical Report | ✓ Link | 85.0 | PaLM 2-L (1-shot) | 2023-05-17 |
Mixtral of Experts | ✓ Link | 83.6 | Mixtral 8x7B (0-shot) | 2024-01-08 |
PaLM 2 Technical Report | ✓ Link | 83.2 | PaLM 2-M (1-shot) | 2023-05-17 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 83.2 | LLaMA-2 7B + MixLoRA | 2024-04-22 |
Mistral 7B | ✓ Link | 83.0 | Mistral 7B (0-shot) | 2023-10-10 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 82.8 | LLaMA 65B (0-shot) | 2023-02-27 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 82.8 | LLaMA 2 70B (0-shot) | 2023-07-18 |
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | ✓ Link | 82.7 | Camelidae-8×34B | 2024-01-05 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 82.3 | LLaMA 33B (0-shot) | 2023-02-27 |
PaLM 2 Technical Report | ✓ Link | 82.2 | PaLM 2-S (1-shot) | 2023-05-17 |
Mixtral of Experts | ✓ Link | 82.2 | Mistral 7B (0-shot) | 2024-01-08 |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | ✓ Link | 82.0 | MT-NLG 530B (0-shot) | 2019-09-17 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 81.9 | LLaMA 2 34B (0-shot) | 2023-07-18 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 81.8 | Gopher 280B (0-shot) | 2021-12-08 |
Training Compute-Optimal Large Language Models | ✓ Link | 81.8 | Chinchilla 70B (0-shot) | 2022-03-29 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 81.7 | FLAN 137B (few-shot, k=10) | 2021-09-03 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 81.07 | OPT-175B | 2023-01-02 |
Language Models are Few-Shot Learners | ✓ Link | 81.0 | GPT-3 175B (0-shot) | 2020-05-28 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 80.63 | SparseGPT 175B (50% Sparsity) | 2023-01-02 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 80.5 | FLAN 137B (0-shot) | 2021-09-03 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 80.5 | LLaMA 2 13B (0-shot) | 2023-07-18 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 80.1 | LLaMA 13B (0-shot) | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 79.8 | LLaMA 7B (0-shot) | 2023-02-27 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 79.54 | SparseGPT 175B (4:8 Sparsity) | 2023-01-02 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 79.54 | SparseGPT 175B (2:4 Sparsity) | 2023-01-02 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 79.4 | RoBERTa-Large 355M | 2019-07-26 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 78.8 | LLaMA 2 7B (0-shot) | 2023-07-18 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 77.9 | Bloomberg GPT 50B (1-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 77.6 | OPT 66B (1-shot) | 2023-03-30 |
PIQA: Reasoning about Physical Commonsense in Natural Language | ✓ Link | 77.1 | RoBERTa-large 355M (fine-tuned) | 2019-11-26 |
Textbooks Are All You Need II: phi-1.5 technical report | ✓ Link | 77 | phi-1.5-web (1.3B) | 2023-09-11 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 77 | BLOOM 176B (1-shot) | 2023-03-30 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 76.7 | Pythia 12B (5-shot) | 2023-04-03 |
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | ✓ Link | 76.2 | Open-LLaMA-3B-v2 | 2023-10-10 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 76 | Pythia 12B (0-shot) | 2023-04-03 |
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | ✓ Link | 75.8 | Sheared-LLaMA-2.7B | 2023-10-10 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 75.8 | GPT-NeoX 20B (1-shot) | 2023-03-30 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 75.2 | Pythia 6.9B (0-shot) | 2023-04-03 |
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning | ✓ Link | 73.4 | Sheared-LLaMA-1.3B | 2023-10-10 |
Efficient Language Modeling with Sparse all-MLP | | 73 | sMLP - deterministic 9.4B (0-shot) | 2022-03-14 |
Language Models are Few-Shot Learners | ✓ Link | 72.9 | GPT-3 Large 760M (0-shot) | 2020-05-28 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 72.2 | FLAN-T5-Large 783M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 71.3 | LaMini-GPT 1.5B | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 70.6 | LaMini-F-T5 783M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 70.5 | GPT-2-XL 1.5B | 2023-04-27 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 70.4 | Pythia 1B (5-shot) | 2023-04-03 |
PIQA: Reasoning about Physical Commonsense in Natural Language | ✓ Link | 69.2 | GPT-2-small 124M (fine-tuned) | 2019-11-26 |
Efficient Language Modeling with Sparse all-MLP | | 68.1 | Gshard 9B | 2022-03-14 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 67.2 | LaMini-T5 738M | 2023-04-27 |
PIQA: Reasoning about Physical Commonsense in Natural Language | ✓ Link | 66.8 | BERT-large 340M (fine-tuned) | 2019-11-26 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 66.7 | BERT-Large 340M | 2018-10-11 |
Efficient Language Modeling with Sparse all-MLP | | 63.8 | Base Layers 10B (0-shot) | 2022-03-14 |
Efficient Language Modeling with Sparse all-MLP | | 63.8 | HASH Layers 10B (0-shot) | 2022-03-14 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 55.9 | T5-Large 738M | 2023-04-27 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 54.73 | OPT-175B (50% Sparsity) | 2023-01-02 |
PIQA: Reasoning about Physical Commonsense in Natural Language | ✓ Link | 50 | Random chance baseline | 2019-11-26 |