OpenCodePapers

question-answering-on-piqa

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark✓ Link90.1Unicorn 11B (fine-tuned)2021-03-24
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link89.7LLaMA3 8B+MoSLoRA2024-06-16
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link88.3CompassMTL 567M with Tailor2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link87.6LLaMA-3 8B + MixLoRA2024-04-22
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering✓ Link87.4DeBERTa-Large 304M2022-10-29
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link87.3CompassMTL 567M2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link86.8LLaMA-2 13B + MixLoRA2024-04-22
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments86.2Shakti-LLM (2.5B)2024-10-15
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering✓ Link85.9DeBERTa-Large 304M (classification-based)2022-10-29
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link85.5ExDeBERTa 567M2022-10-12
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link85.3UnifiedQA 3B2020-05-02
PaLM 2 Technical Report✓ Link85.0PaLM 2-L (1-shot)2023-05-17
Mixtral of Experts✓ Link83.6Mixtral 8x7B (0-shot)2024-01-08
PaLM 2 Technical Report✓ Link83.2PaLM 2-M (1-shot)2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link83.2LLaMA-2 7B + MixLoRA2024-04-22
Mistral 7B✓ Link83.0Mistral 7B (0-shot)2023-10-10
LLaMA: Open and Efficient Foundation Language Models✓ Link82.8LLaMA 65B (0-shot)2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link82.8LLaMA 2 70B (0-shot)2023-07-18
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link82.7Camelidae-8×34B2024-01-05
LLaMA: Open and Efficient Foundation Language Models✓ Link82.3LLaMA 33B (0-shot)2023-02-27
PaLM 2 Technical Report✓ Link82.2PaLM 2-S (1-shot)2023-05-17
Mixtral of Experts✓ Link82.2Mistral 7B (0-shot)2024-01-08
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism✓ Link82.0MT-NLG 530B (0-shot)2019-09-17
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link81.9LLaMA 2 34B (0-shot)2023-07-18
Scaling Language Models: Methods, Analysis & Insights from Training Gopher✓ Link81.8Gopher 280B (0-shot)2021-12-08
Training Compute-Optimal Large Language Models✓ Link81.8Chinchilla 70B (0-shot)2022-03-29
Finetuned Language Models Are Zero-Shot Learners✓ Link81.7FLAN 137B (few-shot, k=10)2021-09-03
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link81.07OPT-175B2023-01-02
Language Models are Few-Shot Learners✓ Link81.0GPT-3 175B (0-shot)2020-05-28
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link80.63SparseGPT 175B (50% Sparsity)2023-01-02
Finetuned Language Models Are Zero-Shot Learners✓ Link80.5FLAN 137B (0-shot)2021-09-03
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link80.5LLaMA 2 13B (0-shot)2023-07-18
LLaMA: Open and Efficient Foundation Language Models✓ Link80.1LLaMA 13B (0-shot)2023-02-27
LLaMA: Open and Efficient Foundation Language Models✓ Link79.8LLaMA 7B (0-shot)2023-02-27
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link79.54SparseGPT 175B (4:8 Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link79.54SparseGPT 175B (2:4 Sparsity)2023-01-02
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link79.4RoBERTa-Large 355M2019-07-26
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link78.8LLaMA 2 7B (0-shot)2023-07-18
BloombergGPT: A Large Language Model for Finance✓ Link77.9Bloomberg GPT 50B (1-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link77.6OPT 66B (1-shot)2023-03-30
PIQA: Reasoning about Physical Commonsense in Natural Language✓ Link77.1RoBERTa-large 355M (fine-tuned)2019-11-26
Textbooks Are All You Need II: phi-1.5 technical report✓ Link77phi-1.5-web (1.3B)2023-09-11
BloombergGPT: A Large Language Model for Finance✓ Link77BLOOM 176B (1-shot)2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link76.7Pythia 12B (5-shot)2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link76.2Open-LLaMA-3B-v22023-10-10
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link76Pythia 12B (0-shot)2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link75.8Sheared-LLaMA-2.7B2023-10-10
BloombergGPT: A Large Language Model for Finance✓ Link75.8GPT-NeoX 20B (1-shot)2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link75.2Pythia 6.9B (0-shot)2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link73.4Sheared-LLaMA-1.3B2023-10-10
Efficient Language Modeling with Sparse all-MLP73sMLP - deterministic 9.4B (0-shot)2022-03-14
Language Models are Few-Shot Learners✓ Link72.9GPT-3 Large 760M (0-shot)2020-05-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link72.2FLAN-T5-Large 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link71.3LaMini-GPT 1.5B2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link70.6LaMini-F-T5 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link70.5GPT-2-XL 1.5B2023-04-27
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link70.4Pythia 1B (5-shot)2023-04-03
PIQA: Reasoning about Physical Commonsense in Natural Language✓ Link69.2GPT-2-small 124M (fine-tuned)2019-11-26
Efficient Language Modeling with Sparse all-MLP68.1Gshard 9B2022-03-14
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link67.2LaMini-T5 738M2023-04-27
PIQA: Reasoning about Physical Commonsense in Natural Language✓ Link66.8BERT-large 340M (fine-tuned)2019-11-26
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding✓ Link66.7BERT-Large 340M2018-10-11
Efficient Language Modeling with Sparse all-MLP63.8Base Layers 10B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP63.8HASH Layers 10B (0-shot)2022-03-14
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link55.9T5-Large 738M2023-04-27
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link54.73OPT-175B (50% Sparsity)2023-01-02
PIQA: Reasoning about Physical Commonsense in Natural Language✓ Link50Random chance baseline2019-11-26