OpenCodePapers

common-sense-reasoning-on-arc-easy

Common Sense Reasoning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link95.2ST-MoE-32B 269B (fine-tuned)2022-02-17
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link90.5LLaMA 3 8B+MoSLoRA (fine-tuned)2024-06-16
PaLM 2 Technical Report✓ Link89.7PaLM 2-L (1-shot)2023-05-17
PaLM 2 Technical Report✓ Link88.0PaLM 2-M (1-shot)2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link86.5LLaMA-3 8B + MixLoRA2024-04-22
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link86.2Camelidae-8×34B2024-01-05
PaLM 2 Technical Report✓ Link85.6PaLM 2-S (1-shot)2023-05-17
Stay on topic with Classifier-Free Guidance84.2LLaMA 65B + CFG (0-shot)2023-06-30
Galactica: A Large Language Model for Science✓ Link83.8GAL 120B (0-shot)2022-11-16
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link83.5LLaMA-2 13B + MixLoRA2024-04-22
Stay on topic with Classifier-Free Guidance83.2LLaMA 30B + CFG (0-shot)2023-06-30
Mixtral of Experts✓ Link83.1Mixtral 8x7B (0-shot)2024-01-08
Finetuned Language Models Are Zero-Shot Learners✓ Link80.7FLAN 137B (few-shot, k=14)2021-09-03
Mixtral of Experts✓ Link80.5Mistral 7B (0-shot)2024-01-08
LLaMA: Open and Efficient Foundation Language Models✓ Link80.0LLaMA 33B (0-shot)2023-02-27
Mistral 7B✓ Link80.0Mistral 7B (0-shot)2023-10-10
Finetuned Language Models Are Zero-Shot Learners✓ Link79.6FLAN 137B (0-shot)2021-09-03
Stay on topic with Classifier-Free Guidance79.1LLaMA 13B + CFG (0-shot)2023-06-30
LLaMA: Open and Efficient Foundation Language Models✓ Link78.9LLaMA 65B (0-shot)2023-02-27
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link77.7LLaMA-2 7B + MixLoRA2024-04-22
Textbooks Are All You Need II: phi-1.5 technical report✓ Link76.1phi-1.5-web 1.3B (0-shot)2023-09-11
BloombergGPT: A Large Language Model for Finance✓ Link75.93BLOOM 176B (1-shot)2023-03-30
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link75.4ST-MoE-L 4.1B (fine-tuned)2022-02-17
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts74.8GLaM (64B/64E) (5-shot)2021-12-13
LLaMA: Open and Efficient Foundation Language Models✓ Link74.8LLaMA 13B (0-shot)2023-02-27
BloombergGPT: A Large Language Model for Finance✓ Link73.99Bloomberg GPT 50B (1-shot)2023-03-30
LLaMA: Open and Efficient Foundation Language Models✓ Link72.8LLaMA 7B (0-shot)2023-02-27
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link71.5Pythia 12B (5-shot)2023-04-03
BloombergGPT: A Large Language Model for Finance✓ Link71.25OPT 66B (1-shot)2023-03-30
Language Models are Few-Shot Learners✓ Link71.2GPT-3 175B (1 shot)2020-05-28
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link71.04OPT-175B2023-01-02
BloombergGPT: A Large Language Model for Finance✓ Link70.79GPT-NeoX 20B (1-shot)2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link70.2Pythia 12B (0-shot)2023-04-03
UL2: Unifying Language Learning Paradigms✓ Link69.8UL2 20B (chain-of-thought + self-consistency)2022-05-10
Mamba: Linear-Time Sequence Modeling with Selective State Spaces✓ Link69.7Mamba-2.8B (0-shot)2023-12-01
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link69.65SparseGPT 175B (50% sparsity)2023-01-02
Galactica: A Large Language Model for Science✓ Link68.8GPT-3 (zero-shot)2022-11-16
Language Models are Few-Shot Learners✓ Link68.8GPT-3 175B (0-shot)2020-05-28
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link68.35SparseGPT (175B, 4:8 Sparsity)2023-01-02
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts68.0GLaM 64B/64E (0-shot)2021-12-13
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link67.08SparseGPT 175B (2:4 sparsity)2023-01-02
Stay on topic with Classifier-Free Guidance58.9LLaMA 7B + CFG (0-shot)2023-06-30
Galactica: A Large Language Model for Science✓ Link40.7BLOOM (5-shot)2022-11-16
UL2: Unifying Language Learning Paradigms✓ Link38.4UL2 20B (chain-of-thought)2022-05-10
Galactica: A Large Language Model for Science✓ Link37.4OPT (5-shot)2022-11-16
UL2: Unifying Language Learning Paradigms✓ Link32.2UL2 20B (0-shot)2022-05-10
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link28.03OPT 175B (50% Sparsity)2023-01-02