Paper | Code | Accuracy (Private) | ModelName | ReleaseDate |
---|---|---|---|---|
CoCa: Contrastive Captioners are Image-Text Foundation Models | ✓ Link | 77.6 | CoCa | 2022-05-04 |
[]() | 77.2 | BASIC (Lion) | ||
Combined Scaling for Zero-shot Transfer Learning | 76.1 | BASIC | 2021-11-19 | |
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | ✓ Link | 74.7 | EVA-CLIP-18B | 2024-02-06 |
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | ✓ Link | 73.9 | InternVL-C | 2023-12-21 |
EVA-CLIP: Improved Training Techniques for CLIP at Scale | ✓ Link | 71.6 | EVA-CLIP-E/14+ | 2023-03-27 |
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | ✓ Link | 58.7 | AltCLIP | 2022-11-12 |