Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues} | | 81.8 | 72.5 | 74.8 | T5-3B (explanation prompting) | 2023-05-01 |
Prompting for explanations improves Adversarial NLI. Is this true? {Yes} it is {true} because {it weakens superficial cues} | | 75.6 | 60.6 | 59.9 | T0-11B (explanation prompting) | 2023-05-01 |
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective | ✓ Link | 75 | 50.5 | 47.7 | InfoBERT (RoBERTa) | 2020-10-05 |
PaLM 2 Technical Report | ✓ Link | 73.1 | 63.4 | 67.1 | PaLM 2-L (one-shot) | 2023-05-17 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 72.4 | 49.8 | 44.4 | RoBERTa (Large) | 2019-07-26 |
Adversarial Training for Large Neural Language Models | ✓ Link | 72.3 | 52.1 | 48.4 | ALUM (RoBERTa-LARGE) | 2020-04-20 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 70.3 | 50.9 | 49.4 | XLNet (Large) | 2019-06-19 |
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets | ✓ Link | 62.3 | 52.6 | 54.1 | ChatGPT | 2023-05-29 |
PaLM 2 Technical Report | ✓ Link | 58.1 | 49.5 | 54.5 | PaLM 2-M (one-shot) | 2023-05-17 |
PaLM 2 Technical Report | ✓ Link | 53.1 | 48.8 | 53.2 | PaLM 2-S (one-shot) | 2023-05-17 |
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning | ✓ Link | 41.7 | 37.2 | 41.9 | T0-3B (CoT fine-tuned) | 2023-05-23 |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | ✓ Link | 39.99 | 37.05 | 37.73 | Flipped-3B | 2022-10-06 |
Language Models are Few-Shot Learners | ✓ Link | 36.8 | 34 | 40.2 | GPT-3 | 2020-05-28 |
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models | | 36.30 | 35.00 | 37.60 | KiC-770M | 2022-10-28 |
Exploring the Benefits of Training Expert Language Models over Instruction Tuning | ✓ Link | 35.49 | 34.64 | 31.22 | RoE-3B | 2023-02-07 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 33.6 | 33.8 | 35.17 | BLOOM 176B (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 33.1 | 34.2 | 34.92 | OPT 66B (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 32.9 | 34.4 | 37.33 | Bloomberg GPT (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 32.6 | 33.8 | 36.17 | GPT-NeoX (one-shot) | 2023-03-30 |
Large Language Models Can Self-Improve | | | 66.5 | 67.9 | PaLM 540B (Self Improvement, Self Consistency) | 2022-10-20 |
Large Language Models Can Self-Improve | | | 65.3 | 67.3 | PaLM 540B (Self Improvement, CoT Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | | 64.8 | 66.9 | PaLM 540B (Self Improvement, Standard-Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | | 64.5 | 63.4 | PaLM 540B (Self Consistency) | 2022-10-20 |
Large Language Models Can Self-Improve | | | 58.9 | 60.6 | PaLM 540B (CoT Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | | 55.8 | 55.8 | PaLM 540B (Standard-Prompting) | 2022-10-20 |