GPT-4 Technical Report | ✓ Link | 96.4 | GPT-4 (few-shot, k=25) | 2023-03-15 |
PaLM 2 Technical Report | ✓ Link | 95.1 | PaLM 2 (few-shot, CoT, SC) | 2023-05-17 |
[]() | | 91.04 | Shivaay (4B, few-shot, k=8) | |
[]() | | 91.03 | StupidLLM | |
Model Card and Evaluations for Claude Models | | 91 | Claude 2 (few-shot, k=5) | 2023-07-11 |
Model Card and Evaluations for Claude Models | | 90 | Claude 1.3 (few-shot, k=5) | 2023-07-11 |
Large Language Models Can Self-Improve | | 89.8 | PaLM 540B (Self Improvement, Self Consistency) | 2022-10-20 |
Large Language Models Can Self-Improve | | 88.7 | PaLM 540B (Self Consistency) | 2022-10-20 |
Large Language Models Can Self-Improve | | 88.3 | PaLM 540B (Self Improvement, CoT Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | 87.2 | PaLM 540B (Self Improvement, Standard-Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | 87.1 | PaLM 540B (Standard-Prompting) | 2022-10-20 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 86.5 | ST-MoE-32B 269B (fine-tuned) | 2022-02-17 |
Model Card and Evaluations for Claude Models | | 85.7 | Claude Instant 1.1 (few-shot, k=5) | 2023-07-11 |
GPT-4 Technical Report | ✓ Link | 85.2 | GPT-3.5 (few-shot, k=25) | 2023-03-15 |
Large Language Models Can Self-Improve | | 85.2 | PaLM 540B (CoT Prompting) | 2022-10-20 |
Mixture-of-Subspaces in Low-Rank Adaptation | ✓ Link | 81.5 | LLaMA 3 8B + MoSLoRA (fine-tuned) | 2024-06-16 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 79.9 | LLaMA-3 8B + MixLoRA | 2024-04-22 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 69.9 | LLaMA-2 13B + MixLoRA | 2024-04-22 |
PaLM 2 Technical Report | ✓ Link | 69.2 | PaLM 2-L (1-shot) | 2023-05-17 |
Galactica: A Large Language Model for Science | ✓ Link | 67.9 | GAL 120B (zero-shot) | 2022-11-16 |
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | ✓ Link | 65.2 | Camelidae-8×34B | 2024-01-05 |
PaLM 2 Technical Report | ✓ Link | 64.9 | PaLM 2-M (1-shot) | 2023-05-17 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 63.8 | FLAN 137B (few-shot, k=13) | 2021-09-03 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 63.1 | FLAN 137B (zero-shot) | 2021-09-03 |
PaLM 2 Technical Report | ✓ Link | 59.6 | PaLM 2-S (1-shot) | 2023-05-17 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 58.1 | LLaMA-2 7B + MixLoRA | 2024-04-22 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 57.8 | LLaMA 33B (zero-shot) | 2023-02-27 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 56.9 | ST-MoE-L 4.1B (fine-tuned) | 2022-02-17 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 56.0 | LLaMA 65B (zero-shot) | 2023-02-27 |
Mistral 7B | ✓ Link | 55.5 | Mistral 7B (0-shot) | 2023-10-10 |
Language Models are Few-Shot Learners | ✓ Link | 53.2 | GPT-3 175B (1 shot) | 2020-05-28 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 52.7 | LLaMA 13B (zero-shot) | 2023-02-27 |
Galactica: A Large Language Model for Science | ✓ Link | 51.4 | GPT-3 (zero-shot) | 2022-11-16 |
Language Models are Few-Shot Learners | ✓ Link | 51.4 | GPT-3 175B (0-shot) | 2020-05-28 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 50.85 | BLOOM 176B (1-shot) | 2023-03-30 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 50.3 | GLaM 64B/64E (0 shot) | 2021-12-13 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 49.5 | UL2 20B (chain-of-thought + self-consistency) | 2022-05-10 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 48.63 | Bloomberg GPT 50B (1-shot) | 2023-03-30 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 48.2 | GLaM 64B/64E (1 shot) | 2021-12-13 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 47.6 | LLaMA 7B (zero-shot) | 2023-02-27 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 45.39 | GPT-NeoX 20B (1-shot) | 2023-03-30 |
Textbooks Are All You Need II: phi-1.5 technical report | ✓ Link | 44.9 | phi-1.5-web 1.3B (zero-shot) | 2023-09-11 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 44.54 | OPT 66B (one-shot) | 2023-03-30 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 43.94 | OPT-175B | 2023-01-02 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 42.9 | UL2 20B (chain-of-thought) | 2022-05-10 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 41.3 | SparseGPT (175B, 50% Sparsity) | 2023-01-02 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 39.85 | SparseGPT (175B, 4:8 Sparsity) | 2023-01-02 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 38.99 | SparseGPT (175B, 2:4 Sparsity) | 2023-01-02 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 36.8 | Pythia 12B (5-shot) | 2023-04-03 |
Galactica: A Large Language Model for Science | ✓ Link | 32.9 | BLOOM (few-shot, k=5) | 2022-11-16 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 31.8 | Pythia 12B (0-shot) | 2023-04-03 |
Galactica: A Large Language Model for Science | ✓ Link | 31.1 | OPT (few-shot, k=5) | 2022-11-16 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 29.8 | UL2 20B (zero-shot) | 2022-05-10 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 25.6 | OPT-175B (50% Sparsity) | 2023-01-02 |