MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | ✓ Link | 81.6 | Meditron-70B (CoT + SC) | 2023-11-27 |
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining | ✓ Link | 81.0 | BioGPT-Large(1.5B) | 2022-10-19 |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | | 79.8 | RankRAG-llama3-70B (Zero-Shot) | 2024-07-02 |
Towards Expert-Level Medical Question Answering with Large Language Models | ✓ Link | 79.2 | Med-PaLM 2 (5-shot) | 2023-05-16 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 79 | Flan-PaLM (540B, Few-shot) | 2022-12-26 |
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining | ✓ Link | 78.2 | BioGPT(345M) | 2022-10-19 |
Can large language models reason about medical questions? | ✓ Link | 78.2 | Codex 5-shot CoT | 2022-07-17 |
PubMedQA: A Dataset for Biomedical Research Question Answering | ✓ Link | 78.0 | Human Performance (single annotator) | 2019-09-13 |
MetaGen Blended RAG: Higher Accuracy for Domain-Specific Q&A Without Fine-Tuning | ✓ Link | 77.9 | MetaGen Blended RAG (zero-shot) | 2025-05-23 |
Galactica: A Large Language Model for Science | ✓ Link | 77.6 | GAL 120B (zero-shot) | 2022-11-16 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 77.2 | Flan-PaLM (62B, Few-shot) | 2022-12-26 |
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models | | 76.8 | MediSwift-XL | 2024-03-01 |
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark | | 76.80 | Flan-T5-XXL | 2024-05-17 |
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine | ✓ Link | 76.1 | BioMedGPT-10B | 2023-08-18 |
The Claude 3 Model Family: Opus, Sonnet, Haiku | | 75.8 | Claude 3 Opus (5-shot) | 2024-03-04 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 75.2 | Flan-PaLM (540B, SC) | 2022-12-26 |
Towards Expert-Level Medical Question Answering with Large Language Models | ✓ Link | 75.0 | Med-PaLM 2 (ER) | 2023-05-16 |
The Claude 3 Model Family: Opus, Sonnet, Haiku | | 74.9 | Claude 3 Opus (zero-shot) | 2024-03-04 |
Towards Expert-Level Medical Question Answering with Large Language Models | ✓ Link | 74.0 | Med-PaLM 2 (CoT + SC) | 2023-05-16 |
Galactica: A Large Language Model for Science | ✓ Link | 73.6 | BLOOM (zero-shot) | 2022-11-16 |
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning | ✓ Link | 73.42 | CoT-T5-11B (1024 Shot) | 2023-05-23 |
LinkBERT: Pretraining Language Models with Document Links | ✓ Link | 72.2 | BioLinkBERT (large) | 2022-03-29 |
LinkBERT: Pretraining Language Models with Document Links | ✓ Link | 70.2 | BioLinkBERT (base) | 2022-03-29 |
Galactica: A Large Language Model for Science | ✓ Link | 70.2 | OPT (zero-shot) | 2022-11-16 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 67.6 | Flan-PaLM (8B, Few-shot) | 2022-12-26 |
BioELECTRA:Pretrained Biomedical text Encoder using Discriminators | ✓ Link | 64.2 | BioELECTRA uncased | 2021-06-11 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 57.8 | PaLM (62B, Few-shot) | 2022-12-26 |
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing | ✓ Link | 55.84 | PubMedBERT uncased | 2020-07-31 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 55 | PaLM (540B, Few-shot) | 2022-12-26 |
Large Language Models Encode Clinical Knowledge | ✓ Link | 34 | PaLM (8B, Few-shot) | 2022-12-26 |