OpenCodePapers

question-answering-on-natural-questions

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeEMModelNameReleaseDate
Atlas: Few-shot Learning with Retrieval Augmented Language Models✓ Link64.0Atlas (full, Wiki-dec-2018 index)2022-08-05
Atlas: Few-shot Learning with Retrieval Augmented Language Models✓ Link60.4Atlas (full, Wiki-dec-2021+CC index)2022-08-05
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation✓ Link59.19DPA-RAG2024-06-26
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering58.4FiE2022-11-18
R2-D2: A Modular Baseline for Open-Domain Question Answering✓ Link55.9R2-D2 (full)2021-09-08
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer✓ Link54.7ReAtt2022-12-05
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering✓ Link54.7FiD-KD (full)2020-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs54.2RankRAG-llama3-70b (Zero-Shot, KILT)2024-07-02
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering✓ Link52.5EMDR^22021-06-09
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering✓ Link51.4FID (full)2020-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs50.6RankRAG-llama3-8b (Zero-Shot, KILT)2024-07-02
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs50.0RankRAG-llama3-70b (Zero-Shot, DPR)2024-07-02
ChatQA: Surpassing GPT-4 on Conversational QA and RAG47.0ChatQA-1.5-llama3-70b (Zero-Shot, KILT)2024-01-18
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs46.1RankRAG-llama3-8b (Zero-Shot, DPR)2024-07-02
Improving language models by retrieving from trillions of tokens✓ Link45.5RETRO + DPR (full)2021-12-08
REPLUG: Retrieval-Augmented Black-Box Language Models✓ Link45.5code-davinci-002 175B + REPLUG LSR (few-shot)2023-01-30
Atlas: Few-shot Learning with Retrieval Augmented Language Models✓ Link45.1Atlas (few-shot, k=64, Wiki-Dec-2018 index)2022-08-05
REPLUG: Retrieval-Augmented Black-Box Language Models✓ Link44.7code-davinci-002 175B + REPLUG (few-shot)2023-01-30
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks✓ Link44.5RAG2020-05-22
ChatQA: Surpassing GPT-4 on Conversational QA and RAG42.7ChatQA-1.5-llama3-8b (Zero-Shot, KILT)2024-01-18
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers✓ Link42.63Blended RAG2024-03-22
Atlas: Few-shot Learning with Retrieval Augmented Language Models✓ Link42.4Atlas (few-shot, k=64, Wiki-dec-2021+CC index)2022-08-05
Dense Passage Retrieval for Open-Domain Question Answering✓ Link41.5DPR2020-04-10
REALM: Retrieval-Augmented Language Model Pre-Training✓ Link40.4REALM2020-02-10
LLaMA: Open and Efficient Foundation Language Models✓ Link39.9LLaMA 65B (few-shot, k=64)2023-02-27
PaLM: Scaling Language Modeling with Pathways✓ Link39.6PaLM-540B (Few-Shot, k=64)2022-04-05
PaLM 2 Technical Report✓ Link37.5PaLM 2-L (one-shot)2023-05-17
Training Compute-Optimal Large Language Models✓ Link35.5Chinchilla (few-shot, k=64)2022-03-29
LLaMA: Open and Efficient Foundation Language Models✓ Link35.0LLaMA 65B (few-shot, k=5)2023-02-27
Search-o1: Agentic Search-Enhanced Large Reasoning Models✓ Link34Search-o12025-01-09
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link33.0LLaMA 2 70B (one-shot)2023-07-18
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts32.5GLaM 62B/64E (Few-Shot)2021-12-13
PaLM 2 Technical Report✓ Link32.0PaLM 2-M (one-shot)2023-05-17
LLaMA: Open and Efficient Foundation Language Models✓ Link31.0LLaMA 65B (one-shot)2023-02-27
Language Models are Few-Shot Learners✓ Link29.9GPT-3 175B (Few-Shot, k=64)2020-05-28
PaLM: Scaling Language Modeling with Pathways✓ Link29.3PaLM-540B (One-Shot)2022-04-05
Mistral 7B✓ Link28.8Mistral 7B (5-shot)2023-10-10
Scaling Language Models: Methods, Analysis & Insights from Training Gopher✓ Link28.2Gopher (few-shot, k=64)2021-12-08
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts26.3GLaM 62B/64E (One-Shot)2021-12-13
[]()26.07LLaMA 7B (Contriever)
PaLM 2 Technical Report✓ Link25.3PaLM 2-S (one-shot)2023-05-17
LLaMA: Open and Efficient Foundation Language Models✓ Link24.9LLaMA 33B (zero-shot)2023-02-27
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts24.7GLaM 62B/64E (Zero-Shot)2021-12-13
PaLM: Scaling Language Modeling with Pathways✓ Link21.2PaLM-540B (Zero-Shot)2022-04-05
Ask Me Anything: A simple strategy for prompting language models✓ Link19.7Neo-6B (QA)2022-10-05
Ask Me Anything: A simple strategy for prompting language models✓ Link 19.6Neo-6B (QA + WS)2022-10-05
Ask Me Anything: A simple strategy for prompting language models✓ Link13.7Neo-6B (Few-Shot)2022-10-05