OpenCodePapers

sentence-completion-on-hellaswag

Sentence Completion
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link96.1CompassMTL 567M with Tailor2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link95.6CompassMTL 567M2022-10-12
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering✓ Link95.6DeBERTa-Large 304M (classification-based)2022-10-29
GPT-4 Technical Report✓ Link95.3GPT-4 (10-shot)2023-03-15
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link95.0LLaMA3+MoSLoRA2024-06-16
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering✓ Link94.7DeBERTa-Large 304M2022-10-29
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link94.7LLaMA-2 13B + MixLoRA2024-04-22
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark✓ Link93.9Unicorn 11B (fine-tuned)2021-03-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link93.3LLaMA-3 8B + MixLoRA2024-04-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link93.1LLaMA-2 7B + MixLoRA2024-04-22
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link93DeBERTa++2020-06-05
DiscoSense: Commonsense Reasoning with Discourse Connectives✓ Link91.5ELECTRA-Large 335M (fine-tuned on DiscoSense and HellaSwag)2022-10-22
[]()89DBRX Instruct 132B (10-shot)
[]()88.3TheBloke/llama-2-70b-Guanaco-QLoRA-fp16 (10-shot)
[]()88ALBERT-XXL 235M
PaLM 2 Technical Report✓ Link87.4PaLM 2-L (1-shot)2023-05-17
DiscoSense: Commonsense Reasoning with Discourse Connectives✓ Link86.9ELECTRA-Large 335M (fine-tuned on HellaSwag)2022-10-22
PaLM 2 Technical Report✓ Link86.7PaLM 2-M (1-shot)2023-05-17
Muppet: Massive Multi-task Representations with Pre-Finetuning✓ Link86.4MUPPET Roberta Large2021-01-26
Stay on topic with Classifier-Free Guidance86.3LLaMA 65B + CFG (0-shot)2023-06-30
The Falcon Series of Open Language Models85.9Falcon-180B (0-shot)2023-11-28
PaLM 2 Technical Report✓ Link85.6PaLM 2-S (1-shot)2023-05-17
GPT-4 Technical Report✓ Link85.5GPT-3.5 (10-shot)2023-03-15
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link85.5RoBERTa-Large Ensemble2019-07-26
Stay on topic with Classifier-Free Guidance85.3LLaMA 30B + CFG (0-shot)2023-06-30
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link85.3LLaMA 2 70B (0-shot)2023-07-18
Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering85.0HyKAS+CSKG2019-10-30
LLaMA: Open and Efficient Foundation Language Models✓ Link84.2LLaMA 65B (0-shot)2023-02-27
PaLM: Scaling Language Modeling with Pathways✓ Link83.8PaLM-540B (Few-Shot)2022-04-05
PaLM: Scaling Language Modeling with Pathways✓ Link83.6PaLM-540B (1-shot)2022-04-05
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link83.6ExDeBERTa 567M2022-10-12
PaLM: Scaling Language Modeling with Pathways✓ Link83.4PaLM-540B (0-shot)2022-04-05
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link83.3LLaMA 2 34B (0-shot)2023-07-18
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link83.2Camelidae-8×34B (10-shot)2024-01-05
LLaMA: Open and Efficient Foundation Language Models✓ Link82.8LLaMA 33B (0-shot)2023-02-27
The Falcon Series of Open Language Models82.7Falcon-40B (0-shot)2023-11-28
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model✓ Link82.4Megatron-Turing NLG 530B (Few-Shot)2022-01-28
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link82.3Qwen2idae-16x14B (10-shot)2024-01-05
Stay on topic with Classifier-Free Guidance82.1LLaMA 13B + CFG (0-shot)2023-06-30
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link81.7RoBERTa-Large 355M2019-07-26
Mistral 7B✓ Link81.3Mistral 7B (0-shot)2023-10-10
Training Compute-Optimal Large Language Models✓ Link80.8Chinchilla 70B (0-shot)2022-03-29
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link80.7LLaMA 2 13B (0-shot)2023-07-18
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model✓ Link80.2Megatron-Turing NLG 530B (1-shot)2022-01-28
Language Models are Few-Shot Learners✓ Link79.3GPT-3 175B (few-shot, k=32)2020-05-28
Scaling Language Models: Methods, Analysis & Insights from Training Gopher✓ Link79.2Gopher 280B (0-shot)2021-12-08
LLaMA: Open and Efficient Foundation Language Models✓ Link79.2LLaMA 13B (0-shot)2023-02-27
Language Models are Few-Shot Learners✓ Link78.9GPT-3 (0-shot)2020-05-28
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link77.2LLaMA 2 7B (0-shot)2023-07-18
The Falcon Series of Open Language Models76.3Falcon-7B (0-shot)2023-11-28
LLaMA: Open and Efficient Foundation Language Models✓ Link76.1LLaMA 7B (0-shot)2023-02-27
BloombergGPT: A Large Language Model for Finance✓ Link73.9BlooombergGPT 50B (1-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link73.5OPT 66B (1-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link73.2BLOOM 176B (1-shot)2023-03-30
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link70.8Sheared-LLaMA-2.7B (50B)2023-10-10
BloombergGPT: A Large Language Model for Finance✓ Link68.4GPT-NeoX 20B (1-shot)2023-03-30
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link67.6Open-LLaMA-3B-v22023-10-10
Mamba: Linear-Time Sequence Modeling with Selective State Spaces✓ Link66.1Mamba-2.8B2023-12-01
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning✓ Link60.7Sheared-LLaMA-1.3B (50B)2023-10-10
Finetuned Language Models Are Zero-Shot Learners✓ Link59.2FLAN 137B (3-shot)2021-09-03
Mamba: Linear-Time Sequence Modeling with Selective State Spaces✓ Link59.1Mamba-1.4B2023-12-01
Finetuned Language Models Are Zero-Shot Learners✓ Link56.7FLAN 137B (0-shot)2021-09-03
Efficient Language Modeling with Sparse all-MLP54.5sMLP – deterministic 9.4B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP52.5Switch Transformer 9B2022-03-14
Language Models are Few-Shot Learners✓ Link51.0GPT-3 Large 760M (0-shot)2020-05-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link50.9GPT-2-XL 1.5B2023-04-27
LLM in a flash: Efficient Large Language Model Inference with Limited Memory50.3OPT-6.7B2023-12-12
LLM in a flash: Efficient Large Language Model Inference with Limited Memory49.8LLM in a Flash (OPT-6.7B with Predictor)2023-12-12
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link48.7FLAN-T5-Large 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link48.3LaMini-GPT 1.5B2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link47.3BERT-Large 340M2019-05-19
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link43.7LaMini-F-T5 783M2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link41.7GPT-1 117M2019-05-19
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners✓ Link41.6Flipped-3B2022-10-06
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link41.1T0-3B (CoT fine-tuned)2023-05-23
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link40.6LaMini-T5 738M2023-04-27
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link40.5BERT-Base 110M2019-05-19
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link38.9T5-Large 738M2023-04-27
Efficient Language Modeling with Sparse all-MLP38Gshard 9B2022-03-14
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link36.2LSTM + BERT-Base2019-05-19
Exploring the Benefits of Training Expert Language Models over Instruction Tuning✓ Link34.6RoE-3B2023-02-07
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link33.3ESIM + ElMo2019-05-19
Efficient Language Modeling with Sparse all-MLP33HASH Layers 10B (0-shot)2022-03-14
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link31.7LSTM + GloVe2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link31.6fastText2019-05-19
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link31.4LSTM + ElMo2019-05-19
Efficient Language Modeling with Sparse all-MLP30.2Base Layers 10B (0-shot)2022-03-14
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models29.6KiC-770M2022-10-28
HellaSwag: Can a Machine Really Finish Your Sentence?✓ Link25Random chance baseline2019-05-19