Paper | Code | avg score | ModelName | ReleaseDate |
---|---|---|---|---|
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | ✓ Link | 85.7 | CuMo-7B | 2024-05-09 |
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | ✓ Link | 79.9 | ShareGPT4V-13B | 2023-11-21 |
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | ✓ Link | 72.6 | ShareGPT4V-7B | 2023-11-21 |
Improved Baselines with Visual Instruction Tuning | ✓ Link | 70.7 | LLaVA-v1.5-13B | 2023-10-05 |
Improved Baselines with Visual Instruction Tuning | ✓ Link | 63.4 | LLaVA-v1.5-7B | 2023-10-05 |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | ✓ Link | 60.9 | InstructBLIP-7B | 2023-05-11 |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | ✓ Link | 58.2 | InstructBLIP-13B | 2023-05-11 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 38.1 | BLIP-2 | 2023-01-30 |