Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning | | 91.4 | | | ALBERT (Ensemble) | 2020-11-06 |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | ✓ Link | 90.9 | 93.1 | 90.0 | Megatron-BERT (ensemble) | 2019-09-17 |
DUMA: Reading Comprehension with Transposition Thinking | ✓ Link | 89.8 | 88.7 | 92.6 | ALBERTxxlarge+DUMA(ensemble) | 2020-01-26 |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | ✓ Link | 89.5 | 91.8 | 88.6 | Megatron-BERT | 2019-09-17 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 86.8 | | | DeBERTalarge | 2020-06-05 |
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing | ✓ Link | 85.7 | 88.8 | 84.4 | B10-10-10 | 2020-06-05 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 83.2 | 86.5 | 81.3 | RoBERTa | 2019-07-26 |
Orca 2: Teaching Small Language Models How to Reason | | 82.87 | | | Orca 2-13B | 2023-11-18 |
Orca 2: Teaching Small Language Models How to Reason | | 80.79 | | | Orca 2-7B | 2023-11-18 |
Hierarchical Learning for Generation with Long Source Sequences | | 67.3 | | | HAT (Encoder) | 2021-04-15 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | | 88.6 | 84.0 | XLNet | 2019-06-19 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | | 68.1 | 49.1 | PaLM 540B (zero-shot) | 2022-04-05 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | 67.9 | 51.6 | LLaMA 65B (zero-shot) | 2023-02-27 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | | 64.3 | 47.5 | PaLM 62B (zero-shot) | 2022-04-05 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | 64.1 | 48.3 | LLaMA 33B (zero-shot) | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | 61.6 | 47.2 | LLaMA 13B (zero-shot) | 2023-02-27 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | | 61.1 | 46.9 | LLaMA 7B (zero-shot) | 2023-02-27 |
Language Models are Few-Shot Learners | ✓ Link | | 58.4 | | GPT-3 175B (0-shot) | 2020-05-28 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | | 57.9 | 42.3 | PaLM 8B (zero-shot) | 2022-04-05 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | | 54.32 | 41.74 | Bloomberg GPT (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | | 52.3 | 39.14 | BLOOM 176B (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | | 47.42 | 37.02 | OPT 66B (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | | 41.23 | 34.33 | GPT-NeoX (one-shot) | 2023-03-30 |
Language Models are Few-Shot Learners | ✓ Link | | | 45.5 | GPT-3 175B (zero-shot) | 2020-05-28 |