OpenCodePapers

common-sense-reasoning-on-arc-challenge

Common Sense Reasoning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
GPT-4 Technical Report✓ Link96.4GPT-4 (few-shot, k=25)2023-03-15
PaLM 2 Technical Report✓ Link95.1PaLM 2 (few-shot, CoT, SC)2023-05-17
[]()91.04Shivaay (4B, few-shot, k=8)
[]()91.03StupidLLM
Model Card and Evaluations for Claude Models91Claude 2 (few-shot, k=5)2023-07-11
Model Card and Evaluations for Claude Models90Claude 1.3 (few-shot, k=5)2023-07-11
Large Language Models Can Self-Improve89.8PaLM 540B (Self Improvement, Self Consistency)2022-10-20
Large Language Models Can Self-Improve88.7PaLM 540B (Self Consistency)2022-10-20
Large Language Models Can Self-Improve88.3PaLM 540B (Self Improvement, CoT Prompting)2022-10-20
Large Language Models Can Self-Improve87.2PaLM 540B (Self Improvement, Standard-Prompting)2022-10-20
Large Language Models Can Self-Improve87.1PaLM 540B (Standard-Prompting)2022-10-20
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link86.5ST-MoE-32B 269B (fine-tuned)2022-02-17
Model Card and Evaluations for Claude Models85.7Claude Instant 1.1 (few-shot, k=5)2023-07-11
GPT-4 Technical Report✓ Link85.2GPT-3.5 (few-shot, k=25)2023-03-15
Large Language Models Can Self-Improve85.2PaLM 540B (CoT Prompting)2022-10-20
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link81.5LLaMA 3 8B + MoSLoRA (fine-tuned)2024-06-16
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link79.9LLaMA-3 8B + MixLoRA2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link69.9LLaMA-2 13B + MixLoRA2024-04-22
PaLM 2 Technical Report✓ Link69.2PaLM 2-L (1-shot)2023-05-17
Galactica: A Large Language Model for Science✓ Link67.9GAL 120B (zero-shot)2022-11-16
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link65.2Camelidae-8×34B2024-01-05
PaLM 2 Technical Report✓ Link64.9PaLM 2-M (1-shot)2023-05-17
Finetuned Language Models Are Zero-Shot Learners✓ Link63.8FLAN 137B (few-shot, k=13)2021-09-03
Finetuned Language Models Are Zero-Shot Learners✓ Link63.1FLAN 137B (zero-shot)2021-09-03
PaLM 2 Technical Report✓ Link59.6PaLM 2-S (1-shot)2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link58.1LLaMA-2 7B + MixLoRA2024-04-22
LLaMA: Open and Efficient Foundation Language Models✓ Link57.8LLaMA 33B (zero-shot)2023-02-27
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link56.9ST-MoE-L 4.1B (fine-tuned)2022-02-17
LLaMA: Open and Efficient Foundation Language Models✓ Link56.0LLaMA 65B (zero-shot)2023-02-27
Mistral 7B✓ Link55.5Mistral 7B (0-shot)2023-10-10
Language Models are Few-Shot Learners✓ Link53.2GPT-3 175B (1 shot)2020-05-28
LLaMA: Open and Efficient Foundation Language Models✓ Link52.7LLaMA 13B (zero-shot)2023-02-27
Galactica: A Large Language Model for Science✓ Link51.4GPT-3 (zero-shot)2022-11-16
Language Models are Few-Shot Learners✓ Link51.4GPT-3 175B (0-shot)2020-05-28
BloombergGPT: A Large Language Model for Finance✓ Link50.85BLOOM 176B (1-shot)2023-03-30
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts50.3GLaM 64B/64E (0 shot)2021-12-13
UL2: Unifying Language Learning Paradigms✓ Link49.5UL2 20B (chain-of-thought + self-consistency)2022-05-10
BloombergGPT: A Large Language Model for Finance✓ Link48.63Bloomberg GPT 50B (1-shot)2023-03-30
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts48.2GLaM 64B/64E (1 shot)2021-12-13
LLaMA: Open and Efficient Foundation Language Models✓ Link47.6LLaMA 7B (zero-shot)2023-02-27
BloombergGPT: A Large Language Model for Finance✓ Link45.39GPT-NeoX 20B (1-shot)2023-03-30
Textbooks Are All You Need II: phi-1.5 technical report✓ Link44.9phi-1.5-web 1.3B (zero-shot)2023-09-11
BloombergGPT: A Large Language Model for Finance✓ Link44.54OPT 66B (one-shot)2023-03-30
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link43.94OPT-175B2023-01-02
UL2: Unifying Language Learning Paradigms✓ Link42.9UL2 20B (chain-of-thought)2022-05-10
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link41.3SparseGPT (175B, 50% Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link39.85SparseGPT (175B, 4:8 Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link38.99SparseGPT (175B, 2:4 Sparsity)2023-01-02
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link36.8Pythia 12B (5-shot)2023-04-03
Galactica: A Large Language Model for Science✓ Link32.9BLOOM (few-shot, k=5)2022-11-16
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link31.8Pythia 12B (0-shot)2023-04-03
Galactica: A Large Language Model for Science✓ Link31.1OPT (few-shot, k=5)2022-11-16
UL2: Unifying Language Learning Paradigms✓ Link29.8UL2 20B (zero-shot)2022-05-10
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link25.6OPT-175B (50% Sparsity)2023-01-02