OpenCodePapers

question-answering-on-triviaqa

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeEMF1ModelNameReleaseDate
Model Card and Evaluations for Claude Models87.5Claude 2 (few-shot, k=5)2023-07-11
[]()87GPT-4-0613
Model Card and Evaluations for Claude Models86.7Claude 1.3 (few-shot, k=5)2023-07-11
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs86.5RankRAG-llama3-70b (Zero-Shot, KILT)2024-07-02
PaLM 2 Technical Report✓ Link86.1PaLM 2-L (one-shot)2023-05-17
ChatQA: Surpassing GPT-4 on Conversational QA and RAG85.6ChatQA-1.5-llama3-70b (Zero-Shot, KILT)2024-01-18
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link85LLaMA 2 70B (one-shot)2023-07-18
GPT-4 Technical Report✓ Link84.8GPT-4-0613 (Zero-shot)2023-03-15
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs82.9RankRAG-llama3-8b (Zero-Shot, KILT)2024-07-02
PaLM 2 Technical Report✓ Link81.7PaLM 2-M (one-shot)2023-05-17
PaLM: Scaling Language Modeling with Pathways✓ Link81.4PaLM-540B (Few-Shot)2022-04-05
PaLM: Scaling Language Modeling with Pathways✓ Link81.4PaLM-540B (One-Shot)2022-04-05
ChatQA: Surpassing GPT-4 on Conversational QA and RAG81.0ChatQA-1.5-llama3-8B (Zero-Shot, KILT)2024-01-18
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling✓ Link79.29GaC(Qwen2-72B-Instruct + Llama-3-70B-Instruct)2024-06-18
Model Card and Evaluations for Claude Models78.9Claude Instant 1.1 (few-shot, k=5)2023-07-11
REPLUG: Retrieval-Augmented Black-Box Language Models✓ Link77.3code-davinci-002 175B + REPLUG LSR (Few-Shot)2023-01-30
PaLM: Scaling Language Modeling with Pathways✓ Link76.9PaLM-540B (Zero-Shot)2022-04-05
REPLUG: Retrieval-Augmented Black-Box Language Models✓ Link76.8code-davinci-002 175B + REPLUG (Few-Shot)2023-01-30
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts75.8GLaM 62B/64E (One-shot)2021-12-13
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts75.8GLaM 62B/64E (Few-shot)2021-12-13
RA-DIT: Retrieval-Augmented Dual Instruction Tuning75.4RA-DIT (Zero-Shot)2023-10-02
PaLM 2 Technical Report✓ Link75.2PaLM 2-S (one-shot)2023-05-17
LLaMA: Open and Efficient Foundation Language Models✓ Link73.0LLaMA 65B (few-shot, k=64)2023-02-27
FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering72.6FiE+PAQ2022-11-18
LLaMA: Open and Efficient Foundation Language Models✓ Link72.6LLaMA 65B (few-shot, k=5)2023-02-27
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs72.6RankRAG-llama3-70b (Zero-Shot, DPR)2024-07-02
Distilling Knowledge from Reader to Retriever for Question Answering✓ Link72.1FiD+Distil2020-12-08
LLaMA: Open and Efficient Foundation Language Models✓ Link71.6LLaMA 65B (one-shot)2023-02-27
End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering✓ Link71.4EMDR22021-06-09
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts71.3GLaM 62B/64E (Zero-shot)2021-12-13
Language Models are Few-Shot Learners✓ Link71.2GPT-3 175B (Few-Shot)2020-05-28
Mistral 7B✓ Link69.9Mistral 7B (5-shot)2023-10-10
ChatQA: Surpassing GPT-4 on Conversational QA and RAG69.0ChatQA-1.5-llama3-70b (Zero-Shot, DPR)2024-01-18
LLaMA: Open and Efficient Foundation Language Models✓ Link68.2LLaMA 65B (zero-shot)2023-02-27
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering✓ Link67.6Fusion-in-Decoder (large)2020-07-02
MemoReader: Large-Scale Reading Comprehension through Neural Memory Controller67.2173.26MemoReader2018-10-01
Simple and Effective Multi-Paragraph Reading Comprehension✓ Link66.3771.32S-Norm2017-10-29
Mention Memory: incorporating textual knowledge into Transformers through entity mention attention✓ Link65.8TOME-22021-10-12
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments58.2Shakti-LLM (2.5B)2024-10-15
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM✓ Link57.1Branch-Train-MiX 4x7B (sampling top-2 experts)2024-03-12
Dense Passage Retrieval for Open-Domain Question Answering✓ Link56.8DPR2020-04-10
Finetuned Language Models Are Zero-Shot Learners✓ Link56.7FLAN 137B (zero-shot)2021-09-03
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks✓ Link56.1RAG2020-05-22
Dynamic Integration of Background Knowledge in Neural NLU Systems50.5656.73Reading Twice for NLU2017-06-08
Reinforced Mnemonic Reader for Machine Reading Comprehension✓ Link46.9452.85Mnemonic Reader2017-05-08
Latent Retrieval for Weakly Supervised Open Domain Question Answering✓ Link45ORQA2019-06-01
MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension43.1646.90MEMEN2017-07-28
SpanBERT: Improving Pre-training by Representing and Predicting Spans✓ Link83.6SpanBERT2019-07-24
Big Bird: Transformers for Longer Sequences✓ Link80.9BigBird-etc2020-07-28
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation✓ Link80.1DPA-RAG2024-06-26
LinkBERT: Pretraining Language Models with Document Links✓ Link78.2LinkBERT (large)2022-03-29
DyREx: Dynamic Query Representation for Extractive Question Answering✓ Link77.37DyREX2022-10-26
Search-o1: Agentic Search-Enhanced Large Reasoning Models✓ Link74.1Search-o12025-01-09
UnitedQA: A Hybrid Approach for Open Domain Question Answering70.3UnitedQA (Hybrid reader)2021-01-01
ReasonBERT: Pre-trained to Reason with Distant Supervision✓ Link45.5ReasonBERTR2021-09-10
ReasonBERT: Pre-trained to Reason with Distant Supervision✓ Link37.2ReasonBERTB2021-09-10