OpenCodePapers

natural-language-visual-grounding-on

Natural Language Visual Grounding
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (%)ModelNameReleaseDate
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents✓ Link86.34UGround-V1-7B2024-10-07
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction✓ Link83.0Aguvis-7B2024-12-05
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents✓ Link82.47OS-Atlas-Base-7B2024-10-30
Aria-UI: Visual Grounding for GUI Instructions✓ Link81.1Aria-UI2024-12-20
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction✓ Link81.0Aguvis-G-7B2024-12-05
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents✓ Link77.67UGround-V1-2B2024-10-07
ShowUI: One Vision-Language-Action Model for GUI Visual Agent✓ Link75.1ShowUI2024-11-26
ShowUI: One Vision-Language-Action Model for GUI Visual Agent✓ Link75.0ShowUI-G2024-11-26
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents✓ Link73.3UGround2024-10-07
OmniParser for Pure Vision Based GUI Agent✓ Link73.0OmniParser2024-08-01
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents✓ Link68.0OS-Atlas-Base-4B2024-10-30
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents✓ Link53.4SeeClick2024-01-17
CogAgent: A Visual Language Model for GUI Agents✓ Link47.4CogAgent2023-12-14
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution✓ Link42.1Qwen2-VL-7B2024-09-18
GUICourse: From General Vision Language Models to Versatile GUI Agents✓ Link28.6Qwen-GUI2024-06-17
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning✓ Link5.7MiniGPT-v22023-10-14
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models✓ Link5.2Groma2024-04-19
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond✓ Link5.2Qwen-VL2023-08-24