OpenCodePapers

visual-grounding-on-refcoco-test-b

Visual Grounding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (%)ModelNameReleaseDate
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks✓ Link92.0Florence-2-large-ft2023-11-10
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video✓ Link86.05mPLUG-22023-02-01
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link81.8X2-VLM (large)2022-11-22
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks✓ Link79.8XFM (base)2023-01-12
X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks✓ Link78.4X2-VLM (base)2022-11-22
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts✓ Link76.91X-VLM (base)2021-11-16