OpenCodePapers

image-retrieval-on-flickr30k-cn

Image Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeR@1R@10R@5ModelNameReleaseDate
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks✓ Link85.997.198.7InternVL-G-FT2023-12-21
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks✓ Link85.297.098.5InternVL-C-FT2023-12-21
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese✓ Link84.498.797.1CN-CLIP (ViT-L/14@336px)2022-11-02
CCMB: A Large-scale Chinese Cross-modal Benchmark✓ Link84.498.496.7R2D2 (ViT-L/14)2022-05-08
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese✓ Link83.898.696.9CN-CLIP (ViT-H/14)2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese✓ Link82.798.696.7CN-CLIP (ViT-L/14)2022-11-02
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese✓ Link79.197.494.8CN-CLIP (ViT-B/16)2022-11-02
CCMB: A Large-scale Chinese Cross-modal Benchmark✓ Link78.397.094.6R2D2 (ViT-B)2022-05-08
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark✓ Link77.497.094.5Wukong (ViT-L/14)2022-02-14
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark✓ Link67.694.289.6Wukong (ViT-B/32)2022-02-14
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese✓ Link66.794.189.4CN-CLIP (RN50)2022-11-02