OpenCodePapers

visual-grounding-on-refcoco-val

Visual Grounding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (%)ModelNameReleaseDate
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks✓ Link93.4Florence-2-large-ft2023-11-10
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video✓ Link90.33mPLUG-22023-02-01
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link87.6X2-VLM (large)2022-11-22
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks✓ Link86.1XFM (base)2023-01-12
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link85.2X2-VLM (base)2022-11-22
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts✓ Link84.51X-VLM (base)2021-11-16