Model Card and Evaluations for Claude Models | | 87.5 | | Claude 2 (few-shot, k=5) | 2023-07-11 |
[]() | | 87 | | GPT-4-0613 | |
Model Card and Evaluations for Claude Models | | 86.7 | | Claude 1.3 (few-shot, k=5) | 2023-07-11 |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | | 86.5 | | RankRAG-llama3-70b (Zero-Shot, KILT) | 2024-07-02 |
PaLM 2 Technical Report | ✓ Link | 86.1 | | PaLM 2-L (one-shot) | 2023-05-17 |
ChatQA: Surpassing GPT-4 on Conversational QA and RAG | | 85.6 | | ChatQA-1.5-llama3-70b (Zero-Shot, KILT) | 2024-01-18 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 85 | | LLaMA 2 70B (one-shot) | 2023-07-18 |
GPT-4 Technical Report | ✓ Link | 84.8 | | GPT-4-0613 (Zero-shot) | 2023-03-15 |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | | 82.9 | | RankRAG-llama3-8b (Zero-Shot, KILT) | 2024-07-02 |
PaLM 2 Technical Report | ✓ Link | 81.7 | | PaLM 2-M (one-shot) | 2023-05-17 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 81.4 | | PaLM-540B (Few-Shot) | 2022-04-05 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 81.4 | | PaLM-540B (One-Shot) | 2022-04-05 |
ChatQA: Surpassing GPT-4 on Conversational QA and RAG | | 81.0 | | ChatQA-1.5-llama3-8B (Zero-Shot, KILT) | 2024-01-18 |
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling | ✓ Link | 79.29 | | GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct) | 2024-06-18 |
Model Card and Evaluations for Claude Models | | 78.9 | | Claude Instant 1.1 (few-shot, k=5) | 2023-07-11 |
REPLUG: Retrieval-Augmented Black-Box Language Models | ✓ Link | 77.3 | | code-davinci-002 175B + REPLUG LSR (Few-Shot) | 2023-01-30 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 76.9 | | PaLM-540B (Zero-Shot) | 2022-04-05 |
REPLUG: Retrieval-Augmented Black-Box Language Models | ✓ Link | 76.8 | | code-davinci-002 175B + REPLUG (Few-Shot) | 2023-01-30 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 75.8 | | GLaM 62B/64E (One-shot) | 2021-12-13 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 75.8 | | GLaM 62B/64E (Few-shot) | 2021-12-13 |
RA-DIT: Retrieval-Augmented Dual Instruction Tuning | | 75.4 | | RA-DIT (Zero-Shot) | 2023-10-02 |
PaLM 2 Technical Report | ✓ Link | 75.2 | | PaLM 2-S (one-shot) | 2023-05-17 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 73.0 | | LLaMA 65B (few-shot, k=64) | 2023-02-27 |
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering | | 72.6 | | FiE+PAQ | 2022-11-18 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 72.6 | | LLaMA 65B (few-shot, k=5) | 2023-02-27 |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | | 72.6 | | RankRAG-llama3-70b (Zero-Shot, DPR) | 2024-07-02 |
Distilling Knowledge from Reader to Retriever for Question Answering | ✓ Link | 72.1 | | FiD+Distil | 2020-12-08 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 71.6 | | LLaMA 65B (one-shot) | 2023-02-27 |
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering | ✓ Link | 71.4 | | EMDR2 | 2021-06-09 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 71.3 | | GLaM 62B/64E (Zero-shot) | 2021-12-13 |
Language Models are Few-Shot Learners | ✓ Link | 71.2 | | GPT-3 175B (Few-Shot) | 2020-05-28 |
Mistral 7B | ✓ Link | 69.9 | | Mistral 7B (5-shot) | 2023-10-10 |
ChatQA: Surpassing GPT-4 on Conversational QA and RAG | | 69.0 | | ChatQA-1.5-llama3-70b (Zero-Shot, DPR) | 2024-01-18 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 68.2 | | LLaMA 65B (zero-shot) | 2023-02-27 |
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering | ✓ Link | 67.6 | | Fusion-in-Decoder (large) | 2020-07-02 |
MemoReader: Large-Scale Reading Comprehension through Neural Memory Controller | | 67.21 | 73.26 | MemoReader | 2018-10-01 |
Simple and Effective Multi-Paragraph Reading Comprehension | ✓ Link | 66.37 | 71.32 | S-Norm | 2017-10-29 |
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention | ✓ Link | 65.8 | | TOME-2 | 2021-10-12 |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | | 58.2 | | Shakti-LLM (2.5B) | 2024-10-15 |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | ✓ Link | 57.1 | | Branch-Train-MiX 4x7B (sampling top-2 experts) | 2024-03-12 |
Dense Passage Retrieval for Open-Domain Question Answering | ✓ Link | 56.8 | | DPR | 2020-04-10 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 56.7 | | FLAN 137B (zero-shot) | 2021-09-03 |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | ✓ Link | 56.1 | | RAG | 2020-05-22 |
Dynamic Integration of Background Knowledge in Neural NLU Systems | | 50.56 | 56.73 | Reading Twice for NLU | 2017-06-08 |
Reinforced Mnemonic Reader for Machine Reading Comprehension | ✓ Link | 46.94 | 52.85 | Mnemonic Reader | 2017-05-08 |
Latent Retrieval for Weakly Supervised Open Domain Question Answering | ✓ Link | 45 | | ORQA | 2019-06-01 |
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension | | 43.16 | 46.90 | MEMEN | 2017-07-28 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans | ✓ Link | | 83.6 | SpanBERT | 2019-07-24 |
Big Bird: Transformers for Longer Sequences | ✓ Link | | 80.9 | BigBird-etc | 2020-07-28 |
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | ✓ Link | | 80.1 | DPA-RAG | 2024-06-26 |
LinkBERT: Pretraining Language Models with Document Links | ✓ Link | | 78.2 | LinkBERT (large) | 2022-03-29 |
DyREx: Dynamic Query Representation for Extractive Question Answering | ✓ Link | | 77.37 | DyREX | 2022-10-26 |
Search-o1: Agentic Search-Enhanced Large Reasoning Models | ✓ Link | | 74.1 | Search-o1 | 2025-01-09 |
UnitedQA: A Hybrid Approach for Open Domain Question Answering | | | 70.3 | UnitedQA (Hybrid reader) | 2021-01-01 |
ReasonBERT: Pre-trained to Reason with Distant Supervision | ✓ Link | | 45.5 | ReasonBERTR | 2021-09-10 |
ReasonBERT: Pre-trained to Reason with Distant Supervision | ✓ Link | | 37.2 | ReasonBERTB | 2021-09-10 |