OpenCodePapers

visual-grounding-on-refcoco-testa

Visual Grounding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (%)IoUModelNameReleaseDate
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks✓ Link 95.3Florence-2-large-ft2023-11-10
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video✓ Link92.8mPLUG-22023-02-01
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link92.1X2-VLM (large)2022-11-22
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks✓ Link90.4XFM (base)2023-01-12
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link90.3X2-VLM (base)2022-11-22
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts✓ Link89.00X-VLM (base)2021-11-16
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning✓ Link61.1HYDRA2024-03-19