sentence-completion-on-hellaswag

Sentence Completion

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	96.1	CompassMTL 567M with Tailor	2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	95.6	CompassMTL 567M	2022-10-12
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering	✓ Link	95.6	DeBERTa-Large 304M (classification-based)	2022-10-29
GPT-4 Technical Report	✓ Link	95.3	GPT-4 (10-shot)	2023-03-15
Mixture-of-Subspaces in Low-Rank Adaptation	✓ Link	95.0	LLaMA3+MoSLoRA	2024-06-16
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering	✓ Link	94.7	DeBERTa-Large 304M	2022-10-29
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	94.7	LLaMA-2 13B + MixLoRA	2024-04-22
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark	✓ Link	93.9	Unicorn 11B (fine-tuned)	2021-03-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	93.3	LLaMA-3 8B + MixLoRA	2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	93.1	LLaMA-2 7B + MixLoRA	2024-04-22
DeBERTa: Decoding-enhanced BERT with Disentangled Attention	✓ Link	93	DeBERTa++	2020-06-05
DiscoSense: Commonsense Reasoning with Discourse Connectives	✓ Link	91.5	ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)	2022-10-22
[]()		89	DBRX Instruct 132B (10-shot)
[]()		88.3	TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)
[]()		88	ALBERT-XXL 235M
PaLM 2 Technical Report	✓ Link	87.4	PaLM 2-L (1-shot)	2023-05-17
DiscoSense: Commonsense Reasoning with Discourse Connectives	✓ Link	86.9	ELECTRA-Large 335M (fine-tuned on HellaSwag)	2022-10-22
PaLM 2 Technical Report	✓ Link	86.7	PaLM 2-M (1-shot)	2023-05-17
Muppet: Massive Multi-task Representations with Pre-Finetuning	✓ Link	86.4	MUPPET Roberta Large	2021-01-26
Stay on topic with Classifier-Free Guidance		86.3	LLaMA 65B + CFG (0-shot)	2023-06-30
The Falcon Series of Open Language Models		85.9	Falcon-180B (0-shot)	2023-11-28
PaLM 2 Technical Report	✓ Link	85.6	PaLM 2-S (1-shot)	2023-05-17
GPT-4 Technical Report	✓ Link	85.5	GPT-3.5 (10-shot)	2023-03-15
RoBERTa: A Robustly Optimized BERT Pretraining Approach	✓ Link	85.5	RoBERTa-Large Ensemble	2019-07-26
Stay on topic with Classifier-Free Guidance		85.3	LLaMA 30B + CFG (0-shot)	2023-06-30
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	85.3	LLaMA 2 70B (0-shot)	2023-07-18
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering		85.0	HyKAS+CSKG	2019-10-30
LLaMA: Open and Efficient Foundation Language Models	✓ Link	84.2	LLaMA 65B (0-shot)	2023-02-27
PaLM: Scaling Language Modeling with Pathways	✓ Link	83.8	PaLM-540B (Few-Shot)	2022-04-05
PaLM: Scaling Language Modeling with Pathways	✓ Link	83.6	PaLM-540B (1-shot)	2022-04-05
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	83.6	ExDeBERTa 567M	2022-10-12
PaLM: Scaling Language Modeling with Pathways	✓ Link	83.4	PaLM-540B (0-shot)	2022-04-05
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	83.3	LLaMA 2 34B (0-shot)	2023-07-18
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	✓ Link	83.2	Camelidae-8×34B (10-shot)	2024-01-05
LLaMA: Open and Efficient Foundation Language Models	✓ Link	82.8	LLaMA 33B (0-shot)	2023-02-27
The Falcon Series of Open Language Models		82.7	Falcon-40B (0-shot)	2023-11-28
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model	✓ Link	82.4	Megatron-Turing NLG 530B (Few-Shot)	2022-01-28
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	✓ Link	82.3	Qwen2idae-16x14B (10-shot)	2024-01-05
Stay on topic with Classifier-Free Guidance		82.1	LLaMA 13B + CFG (0-shot)	2023-06-30
RoBERTa: A Robustly Optimized BERT Pretraining Approach	✓ Link	81.7	RoBERTa-Large 355M	2019-07-26
Mistral 7B	✓ Link	81.3	Mistral 7B (0-shot)	2023-10-10
Training Compute-Optimal Large Language Models	✓ Link	80.8	Chinchilla 70B (0-shot)	2022-03-29
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	80.7	LLaMA 2 13B (0-shot)	2023-07-18
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model	✓ Link	80.2	Megatron-Turing NLG 530B (1-shot)	2022-01-28
Language Models are Few-Shot Learners	✓ Link	79.3	GPT-3 175B (few-shot, k=32)	2020-05-28
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	✓ Link	79.2	Gopher 280B (0-shot)	2021-12-08
LLaMA: Open and Efficient Foundation Language Models	✓ Link	79.2	LLaMA 13B (0-shot)	2023-02-27
Language Models are Few-Shot Learners	✓ Link	78.9	GPT-3 (0-shot)	2020-05-28
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	77.2	LLaMA 2 7B (0-shot)	2023-07-18
The Falcon Series of Open Language Models		76.3	Falcon-7B (0-shot)	2023-11-28
LLaMA: Open and Efficient Foundation Language Models	✓ Link	76.1	LLaMA 7B (0-shot)	2023-02-27
BloombergGPT: A Large Language Model for Finance	✓ Link	73.9	BlooombergGPT 50B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link	73.5	OPT 66B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link	73.2	BLOOM 176B (1-shot)	2023-03-30
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	70.8	Sheared-LLaMA-2.7B (50B)	2023-10-10
BloombergGPT: A Large Language Model for Finance	✓ Link	68.4	GPT-NeoX 20B (1-shot)	2023-03-30
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	67.6	Open-LLaMA-3B-v2	2023-10-10
Mamba: Linear-Time Sequence Modeling with Selective State Spaces	✓ Link	66.1	Mamba-2.8B	2023-12-01
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	60.7	Sheared-LLaMA-1.3B (50B)	2023-10-10
Finetuned Language Models Are Zero-Shot Learners	✓ Link	59.2	FLAN 137B (3-shot)	2021-09-03
Mamba: Linear-Time Sequence Modeling with Selective State Spaces	✓ Link	59.1	Mamba-1.4B	2023-12-01
Finetuned Language Models Are Zero-Shot Learners	✓ Link	56.7	FLAN 137B (0-shot)	2021-09-03
Efficient Language Modeling with Sparse all-MLP		54.5	sMLP – deterministic 9.4B (0-shot)	2022-03-14
Efficient Language Modeling with Sparse all-MLP		52.5	Switch Transformer 9B	2022-03-14
Language Models are Few-Shot Learners	✓ Link	51.0	GPT-3 Large 760M (0-shot)	2020-05-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	50.9	GPT-2-XL 1.5B	2023-04-27
LLM in a flash: Efficient Large Language Model Inference with Limited Memory		50.3	OPT-6.7B	2023-12-12
LLM in a flash: Efficient Large Language Model Inference with Limited Memory		49.8	LLM in a Flash (OPT-6.7B with Predictor)	2023-12-12
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	48.7	FLAN-T5-Large 783M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	48.3	LaMini-GPT 1.5B	2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	47.3	BERT-Large 340M	2019-05-19
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	43.7	LaMini-F-T5 783M	2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	41.7	GPT-1 117M	2019-05-19
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners	✓ Link	41.6	Flipped-3B	2022-10-06
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning	✓ Link	41.1	T0-3B (CoT fine-tuned)	2023-05-23
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	40.6	LaMini-T5 738M	2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	40.5	BERT-Base 110M	2019-05-19
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	38.9	T5-Large 738M	2023-04-27
Efficient Language Modeling with Sparse all-MLP		38	Gshard 9B	2022-03-14
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	36.2	LSTM + BERT-Base	2019-05-19
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	✓ Link	34.6	RoE-3B	2023-02-07
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	33.3	ESIM + ElMo	2019-05-19
Efficient Language Modeling with Sparse all-MLP		33	HASH Layers 10B (0-shot)	2022-03-14
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	31.7	LSTM + GloVe	2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	31.6	fastText	2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	31.4	LSTM + ElMo	2019-05-19
Efficient Language Modeling with Sparse all-MLP		30.2	Base Layers 10B (0-shot)	2022-03-14
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models		29.6	KiC-770M	2022-10-28
HellaSwag: Can a Machine Really Finish Your Sentence?	✓ Link	25	Random chance baseline	2019-05-19

OpenCodePapers

sentence-completion-on-hellaswag