Frontiers in Intelligent Colonoscopy | ✓ Link | 80.18 | ColonGPT (w/ LoRA, w/o extra data) | 2024-10-22 |
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | ✓ Link | 78.03 | MobileVLM-1.7B
(w/ LoRA, w/ extra data) | 2023-12-28 |
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | ✓ Link | 75.25 | LLaVA-Med-v1.0
(w/o LoRA, w/ extra data) | 2023-06-01 |
Efficient Multimodal Learning from Data-centric Perspective | ✓ Link | 75.08 | Bunny-v1.0-3B
(w/ LoRA, w/ extra data) | 2024-02-18 |
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | ✓ Link | 75.07 | LLaVA-Med-v1.0
(w/o LoRA, w/o extra data) | 2023-06-01 |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | ✓ Link | 74.30 | MGM-2B
(w/o LoRA, w/ extra data) | 2024-03-27 |
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | ✓ Link | 73.14 | MobileVLM-1.7B
(w/o LoRA, w/ extra data) | 2023-12-28 |
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | ✓ Link | 73.05 | LLaVA-Med-v1.5
(w/ LoRA, w/o extra data) | 2023-06-01 |
Improved Baselines with Visual Instruction Tuning | ✓ Link | 72.88 | LLaVA-v1.5
(w/ LoRA, w/ extra data) | 2023-10-05 |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | ✓ Link | 72.05 | MiniGPT-v2
(w/ LoRA, w/o extra data) | 2023-10-14 |
Improved Baselines with Visual Instruction Tuning | ✓ Link | 70.38 | LLaVA-v1.5
(w/ LoRA, w/o extra data) | 2023-10-05 |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | ✓ Link | 70.23 | MiniGPT-v2
(w/ LoRA, w/ extra data) | 2023-10-14 |
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | ✓ Link | 70.00 | LLaVA-Med-v1.5
(w/ LoRA, w/ extra data) | 2023-06-01 |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | ✓ Link | 69.81 | MGM-2B
(w/o LoRA, w/o extra data) | 2024-03-27 |
Efficient Multimodal Learning from Data-centric Perspective | ✓ Link | 69.45 | Bunny-v1.0-3B
(w/ LoRA, w/o extra data) | 2024-02-18 |
Visual Instruction Tuning | ✓ Link | 68.11 | LLaVA-v1
(w/ LoRA, w/o extra data) | 2023-04-17 |
Visual Instruction Tuning | ✓ Link | 46.85 | LLaVA-v1
(w/ LoRA, w/ extra data) | 2023-04-17 |