Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles | ✓ Link | 99.87 | Mistral-Nemo 12B (HPT) | 2024-06-18 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 92.4 | ST-MoE-32B 269B (fine-tuned) | 2022-02-17 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 92.2 | PaLM 540B (fine-tuned) | 2022-04-05 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 92 | Turing NLR v5 XXL 5.4B (fine-tuned) | 2022-12-04 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 91.2 | T5-XXL 11B (fine-tuned) | 2019-10-23 |
PaLM 2 Technical Report | ✓ Link | 90.9 | PaLM 2-L (1-shot) | 2023-05-17 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 90.8 | UL2 20B (fine-tuned) | 2022-05-10 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 90.5 | Vega v2 6B (fine-tuned) | 2022-12-04 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 90.4 | DeBERTa-1.5B | 2020-06-05 |
PaLM 2 Technical Report | ✓ Link | 88.6 | PaLM 2-M (1-shot) | 2023-05-17 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 88.6 | ST-MoE-L 4.1B (fine-tuned) | 2022-02-17 |
PaLM 2 Technical Report | ✓ Link | 88.1 | PaLM 2-S (1-shot) | 2023-05-17 |
Muppet: Massive Multi-task Representations with Pre-Finetuning | ✓ Link | 87.5 | MUPPET Roberta Large | 2021-01-26 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 86.3 | FLAN 137B (prompt-tuned) | 2021-09-03 |
Entailment as Few-Shot Learner | ✓ Link | 86.0 | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 85.4 | T5-Large 770M (fine-tuned) | 2019-10-23 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 85.3 | LLaMA 65B (0-shot) | 2023-02-27 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 85 | LLaMA 2 70B (0-shot) | 2023-07-18 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 84.6 | FLAN 137B (4-shot) | 2021-09-03 |
Muppet: Massive Multi-task Representations with Pre-Finetuning | ✓ Link | 83.8 | MUPPET Roberta Base | 2021-01-26 |
Training Compute-Optimal Large Language Models | ✓ Link | 83.7 | Chinchilla 70B (0-shot) | 2022-03-29 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 83.7 | LLaMA 2 34B (0-shot) | 2023-07-18 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 83.1 | LLaMA 33B (0-shot) | 2023-02-27 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 82.9 | FLAN 137B (0-shot) | 2021-09-03 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 81.7 | LLaMA 2 13B (0-shot) | 2023-07-18 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 81.4 | T5-Base 220M (fine-tuned) | 2019-10-23 |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | ✓ Link | 80.4 | BERT-MultiNLI 340M (fine-tuned) | 2019-05-24 |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | ✓ Link | 79.3 | Gopher (zero-shot) | 2021-12-08 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 78.1 | LLaMA 13B (zero-shot) | 2023-02-27 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 77.4 | LLaMA 2 7B (zero-shot) | 2023-07-18 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 77.1 | LLaMA-2 13B + MixLoRA | 2024-04-22 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 76.5 | LLaMA 7B (zero-shot) | 2023-02-27 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 76.4 | T5-Small 60M (fine-tuned) | 2019-10-23 |
Language Models are Few-Shot Learners | ✓ Link | 76.4 | GPT-3 175B (few-shot, k=32) | 2020-05-28 |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | ✓ Link | 75.57 | BiDAF-MultiNLI (fine-tuned) | 2019-05-24 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 75 | LLaMA-3 8B + MixLoRA | 2024-04-22 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 74.6 | Bloomberg GPT 50B (1-shot) | 2023-03-30 |
Mixture-of-Subspaces in Low-Rank Adaptation | ✓ Link | 74.6 | LLaMA3+MoSLoRA | 2024-06-16 |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | ✓ Link | 72.87 | GPT-1 117M (fine-tuned) | 2019-05-24 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 72.7 | LLaMA-2 7B + MixLoRA | 2024-04-22 |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | ✓ Link | 71.41 | BiDAF + ELMo (fine-tuned) | 2019-05-24 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 71.4 | OPT-IML 175B | 2022-12-22 |
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | ✓ Link | 69.4 | AlexaTM 20B | 2022-08-02 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 67.2 | Neo-6B (QA + WS) | 2022-10-05 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 66.9 | OPT-IML 30B | 2022-12-22 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 66.5 | Neo-6B (few-shot) | 2022-10-05 |
N-Grammer: Augmenting Transformers with latent n-grams | ✓ Link | 65 | N-Grammer 343M | 2022-07-13 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 64.9 | Neo-6B (QA) | 2022-10-05 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 64 | OPT 30B (0-shot) | 2022-12-22 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 63.1 | UL2 20B (0-shot) | 2022-05-10 |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | ✓ Link | 62.17 | Majority baseline | 2019-05-24 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 61.7 | Hybrid H3 1.3B (0-shot, logit scoring) | 2022-12-28 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 61.5 | OPT-IML 1.3B (0-shot) | 2022-12-22 |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | | 61.1 | Shakti-LLM (2.5B) | 2024-10-15 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 60.6 | Hybrid H3 2.7B (3-shot, logit scoring) | 2022-12-28 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 60.5 | OPT 1.3B (zero-shot) | 2022-12-22 |
Language Models are Few-Shot Learners | ✓ Link | 60.5 | GPT-3 75B (0-shot) | 2020-05-28 |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | ✓ Link | 60.1 | OPT 175B | 2022-12-22 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 59.6 | Hybrid H3 125M (0-shot, logit scoring) | 2022-12-28 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 57.5 | OPT 66B (1-shot) | 2023-03-30 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 56.1 | Hybrid H3 125M (3-shot, logit scoring) | 2022-12-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 56.1 | Hybrid H3 125M (3-shot, rank classification) | 2022-12-28 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 52.9 | BLOOM 176B (1-shot) | 2023-03-30 |
Hyena Hierarchy: Towards Larger Convolutional Language Models | ✓ Link | 51.8 | Hyena | 2023-02-21 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 46.4 | GPT-NeoX 20B (1-shot) | 2023-03-30 |