question-answering-on-natural-questions

Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	EM	ModelName	ReleaseDate
Atlas: Few-shot Learning with Retrieval Augmented Language Models	✓ Link	64.0	Atlas (full, Wiki-dec-2018 index)	2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models	✓ Link	60.4	Atlas (full, Wiki-dec-2021+CC index)	2022-08-05
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation	✓ Link	59.19	DPA-RAG	2024-06-26
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering		58.4	FiE	2022-11-18
R2-D2: A Modular Baseline for Open-Domain Question Answering	✓ Link	55.9	R2-D2 (full)	2021-09-08
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering	✓ Link	54.7	FiD-KD (full)	2020-07-02
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer	✓ Link	54.7	ReAtt	2022-12-05
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs		54.2	RankRAG-llama3-70b (Zero-Shot, KILT)	2024-07-02
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering	✓ Link	52.5	EMDR^2	2021-06-09
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering	✓ Link	51.4	FID (full)	2020-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs		50.6	RankRAG-llama3-8b (Zero-Shot, KILT)	2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs		50.0	RankRAG-llama3-70b (Zero-Shot, DPR)	2024-07-02
ChatQA: Surpassing GPT-4 on Conversational QA and RAG		47.0	ChatQA-1.5-llama3-70b (Zero-Shot, KILT)	2024-01-18
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs		46.1	RankRAG-llama3-8b (Zero-Shot, DPR)	2024-07-02
Improving language models by retrieving from trillions of tokens	✓ Link	45.5	RETRO + DPR (full)	2021-12-08
REPLUG: Retrieval-Augmented Black-Box Language Models	✓ Link	45.5	code-davinci-002 175B + REPLUG LSR (few-shot)	2023-01-30
Atlas: Few-shot Learning with Retrieval Augmented Language Models	✓ Link	45.1	Atlas (few-shot, k=64, Wiki-Dec-2018 index)	2022-08-05
REPLUG: Retrieval-Augmented Black-Box Language Models	✓ Link	44.7	code-davinci-002 175B + REPLUG (few-shot)	2023-01-30
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	✓ Link	44.5	RAG	2020-05-22
ChatQA: Surpassing GPT-4 on Conversational QA and RAG		42.7	ChatQA-1.5-llama3-8b (Zero-Shot, KILT)	2024-01-18
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers	✓ Link	42.63	Blended RAG	2024-03-22
Atlas: Few-shot Learning with Retrieval Augmented Language Models	✓ Link	42.4	Atlas (few-shot, k=64, Wiki-dec-2021+CC index)	2022-08-05
Dense Passage Retrieval for Open-Domain Question Answering	✓ Link	41.5	DPR	2020-04-10
REALM: Retrieval-Augmented Language Model Pre-Training	✓ Link	40.4	REALM	2020-02-10
LLaMA: Open and Efficient Foundation Language Models	✓ Link	39.9	LLaMA 65B (few-shot, k=64)	2023-02-27
PaLM: Scaling Language Modeling with Pathways	✓ Link	39.6	PaLM-540B (Few-Shot, k=64)	2022-04-05
PaLM 2 Technical Report	✓ Link	37.5	PaLM 2-L (one-shot)	2023-05-17
Training Compute-Optimal Large Language Models	✓ Link	35.5	Chinchilla (few-shot, k=64)	2022-03-29
LLaMA: Open and Efficient Foundation Language Models	✓ Link	35.0	LLaMA 65B (few-shot, k=5)	2023-02-27
Search-o1: Agentic Search-Enhanced Large Reasoning Models	✓ Link	34	Search-o1	2025-01-09
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	33.0	LLaMA 2 70B (one-shot)	2023-07-18
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts		32.5	GLaM 62B/64E (Few-Shot)	2021-12-13
PaLM 2 Technical Report	✓ Link	32.0	PaLM 2-M (one-shot)	2023-05-17
LLaMA: Open and Efficient Foundation Language Models	✓ Link	31.0	LLaMA 65B (one-shot)	2023-02-27
Language Models are Few-Shot Learners	✓ Link	29.9	GPT-3 175B (Few-Shot, k=64)	2020-05-28
PaLM: Scaling Language Modeling with Pathways	✓ Link	29.3	PaLM-540B (One-Shot)	2022-04-05
Mistral 7B	✓ Link	28.8	Mistral 7B (5-shot)	2023-10-10
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	✓ Link	28.2	Gopher (few-shot, k=64)	2021-12-08
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts		26.3	GLaM 62B/64E (One-Shot)	2021-12-13
[]()		26.07	LLaMA 7B (Contriever)
PaLM 2 Technical Report	✓ Link	25.3	PaLM 2-S (one-shot)	2023-05-17
LLaMA: Open and Efficient Foundation Language Models	✓ Link	24.9	LLaMA 33B (zero-shot)	2023-02-27
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts		24.7	GLaM 62B/64E (Zero-Shot)	2021-12-13
PaLM: Scaling Language Modeling with Pathways	✓ Link	21.2	PaLM-540B (Zero-Shot)	2022-04-05
Ask Me Anything: A simple strategy for prompting language models	✓ Link	19.7	Neo-6B (QA)	2022-10-05
Ask Me Anything: A simple strategy for prompting language models	✓ Link	19.6	Neo-6B (QA + WS)	2022-10-05
Ask Me Anything: A simple strategy for prompting language models	✓ Link	13.7	Neo-6B (Few-Shot)	2022-10-05

OpenCodePapers

question-answering-on-natural-questions