OpenCodePapers

zero-shot-transfer-image-classification-on-1

Zero-Shot Transfer Image Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeParamAccuracy (Private)Accuracy (Public)ModelNameReleaseDate
M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining✓ Link10B88.5M2-Encoder2024-01-29
[]()88.3BASIC (Lion)
CoCa: Contrastive Captioners are Image-Text Foundation Models✓ Link86.3CoCa2022-05-04
Scaling Vision Transformers to 22 Billion Parameters✓ Link85.9LiT-22B2023-02-10
Combined Scaling for Zero-shot Transfer Learning85.7BASIC2021-11-19
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link85.4LiT ViT-e2022-09-14
LiT: Zero-Shot Transfer with Locked-image text Tuning✓ Link84.575.7LiT-tuning2021-11-15
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception83.9IMP-MoE-L2023-05-10
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters✓ Link83.8EVA-CLIP-18B2024-02-06
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks✓ Link83.2InternVL-C2023-12-21
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link82.1MAWS (ViT-2B)2023-03-23
EVA-CLIP: Improved Training Techniques for CLIP at Scale✓ Link82EVA-CLIP-E/14+2023-03-27
[]()81.8CLIPA (ViT-H/14-336px)
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link81.1MAWS (ViT-H)2023-03-23
Learning Customized Visual Models with Retrieval-Augmented Knowledge✓ Link78.5REACT2023-01-17
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision✓ Link76.4-ALIGN2021-02-11
Learning Transferable Visual Models From Natural Language Supervision✓ Link76.2CLIP(ViT-L/14-336px)2021-02-26
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities✓ Link74.5AltCLIP2022-11-12
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link72.11PaLI2022-09-14
Your Diffusion Model is Secretly a Zero-Shot Classifier✓ Link61.4Diffusion Classifier (zero-shot)2023-03-28
Learning Transferable Visual Models From Natural Language Supervision✓ Link59.6CLIP (ResNet50)2021-02-26
[]()76.5CWCL
Learning Transferable Visual Models From Natural Language Supervision✓ Link 31.3CLIP2021-02-26