OpenCodePapers

common-sense-reasoning-on-commonsenseqa

Common Sense Reasoning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles✓ Link92.54GPT-4o (HPT)2024-06-18
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention✓ Link91.2DeBERTaV3-large+KEAR2021-12-06
PaLM 2 Technical Report✓ Link90.4PaLM 2 (few‑shot, CoT, SC)2023-05-17
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention✓ Link89.4KEAR2021-12-06
Fusing Context Into Knowledge Graph for Commonsense Question Answering✓ Link83.3DEKCOR2020-12-09
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark✓ Link79.3Unicorn 11B (fine-tuned)2021-03-24
Muppet: Massive Multi-task Representations with Pre-Finetuning✓ Link79.2MUPPET Roberta Large2021-01-26
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link79.1UnifiedQA 11B (fine-tuned)2020-05-02
Deep Bidirectional Language-Knowledge Graph Pretraining✓ Link78.2DRAGON2022-10-17
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link78.1T5-XXL 11B (fine-tuned)2020-05-02
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations✓ Link76.5Albert Lan et al. (2020) (ensemble)2019-09-26
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link76.2UnifiedQA 11B (zero-shot)2020-05-02
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering✓ Link76.1QA-GNN2021-04-13
Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering✓ Link75.3XLNet+GraphReason2019-09-09
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering73.5GrapeQA: PEGA2023-03-22
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering73.2RoBERTa+HyKAS Ma et al. (2019)2019-10-30
Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention✓ Link73.0GPT-3 Direct Finetuned2021-12-06
STaR: Bootstrapping Reasoning With Reasoning✓ Link72.3STaR (on GPT-J)2022-03-28
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link72.1RoBERTa-Large 355M2019-07-26
STaR: Bootstrapping Reasoning With Reasoning✓ Link68.8STaR without Rationalization (on GPT-J)2022-03-28
BloombergGPT: A Large Language Model for Finance✓ Link66.4OPT 66B (1-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link65.5Bloomberg GPT 50B (1-shot)2023-03-30
Explain Yourself! Leveraging Language Models for Commonsense Reasoning✓ Link64.7CAGE-reasoning2019-06-06
BloombergGPT: A Large Language Model for Finance✓ Link64.2BLOOM 176B (1-shot)2023-03-30
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link64UnifiedQA 440M (fine-tuned)2020-05-02
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link62.5BART-large 440M (fine-tuned)2020-05-02
Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models62.2BERT_CSlarge2019-08-19
BloombergGPT: A Large Language Model for Finance✓ Link60.4GPT-NeoX 20B (1-shot)2023-03-30
STaR: Bootstrapping Reasoning With Reasoning✓ Link60.0GPT-J Direct Finetuned2022-03-28
KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning✓ Link58.9KagNet2019-09-04
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge✓ Link55.9BERT-LARGE2018-11-02
UL2: Unifying Language Learning Paradigms✓ Link55.7UL2 20B (chain-of-thought + self-consistency)2022-05-10
STaR: Bootstrapping Reasoning With Reasoning✓ Link 55.6Few-shot CoT LaMDA 137B2022-03-28
UL2: Unifying Language Learning Paradigms✓ Link51.4UL2 20B (chain-of-thought)2022-05-10
STaR: Bootstrapping Reasoning With Reasoning✓ Link36.6Few-shot CoT GPT-J2022-03-28
UL2: Unifying Language Learning Paradigms✓ Link34.2UL2 20B (zero-shot)2022-05-10
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models✓ Link28.6Chain of thought ASDiv2022-01-28
STaR: Bootstrapping Reasoning With Reasoning✓ Link20.9Few-shot Direct GPT-J2022-03-28