OpenCodePapers

question-answering-on-pubmedqa

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models✓ Link81.6Meditron-70B (CoT + SC)2023-11-27
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining✓ Link81.0BioGPT-Large(1.5B)2022-10-19
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs79.8RankRAG-llama3-70B (Zero-Shot)2024-07-02
Towards Expert-Level Medical Question Answering with Large Language Models✓ Link79.2Med-PaLM 2 (5-shot)2023-05-16
Large Language Models Encode Clinical Knowledge✓ Link79Flan-PaLM (540B, Few-shot)2022-12-26
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining✓ Link78.2BioGPT(345M)2022-10-19
Can large language models reason about medical questions?✓ Link78.2Codex 5-shot CoT2022-07-17
PubMedQA: A Dataset for Biomedical Research Question Answering✓ Link78.0Human Performance (single annotator)2019-09-13
MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-Tuning✓ Link77.9MetaGen Blended RAG (zero-shot)2025-05-23
Galactica: A Large Language Model for Science✓ Link77.6GAL 120B (zero-shot)2022-11-16
Large Language Models Encode Clinical Knowledge✓ Link77.2Flan-PaLM (62B, Few-shot)2022-12-26
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models76.8MediSwift-XL2024-03-01
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark76.80Flan-T5-XXL2024-05-17
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine✓ Link76.1BioMedGPT-10B2023-08-18
The Claude 3 Model Family: Opus, Sonnet, Haiku75.8Claude 3 Opus (5-shot)2024-03-04
Large Language Models Encode Clinical Knowledge✓ Link75.2Flan-PaLM (540B, SC)2022-12-26
Towards Expert-Level Medical Question Answering with Large Language Models✓ Link75.0Med-PaLM 2 (ER)2023-05-16
The Claude 3 Model Family: Opus, Sonnet, Haiku74.9Claude 3 Opus (zero-shot)2024-03-04
Towards Expert-Level Medical Question Answering with Large Language Models✓ Link74.0Med-PaLM 2 (CoT + SC)2023-05-16
Galactica: A Large Language Model for Science✓ Link73.6BLOOM (zero-shot)2022-11-16
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link73.42CoT-T5-11B (1024 Shot)2023-05-23
LinkBERT: Pretraining Language Models with Document Links✓ Link72.2BioLinkBERT (large)2022-03-29
LinkBERT: Pretraining Language Models with Document Links✓ Link70.2BioLinkBERT (base)2022-03-29
Galactica: A Large Language Model for Science✓ Link70.2OPT (zero-shot)2022-11-16
Large Language Models Encode Clinical Knowledge✓ Link67.6Flan-PaLM (8B, Few-shot)2022-12-26
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators✓ Link64.2BioELECTRA uncased2021-06-11
Large Language Models Encode Clinical Knowledge✓ Link57.8PaLM (62B, Few-shot)2022-12-26
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing✓ Link55.84PubMedBERT uncased2020-07-31
Large Language Models Encode Clinical Knowledge✓ Link55PaLM (540B, Few-shot)2022-12-26
Large Language Models Encode Clinical Knowledge✓ Link34PaLM (8B, Few-shot)2022-12-26