Paper | Code | Total | Position-rel | Position-abs | Orientation-rel | Orientation-abs | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | ✓ Link | 43.9 | 59.6 | 33.8 | 54.6 | 31.3 | SoFar | 2025-02-18 |
GPT-4o System Card | 36.2 | 49.4 | 28.4 | 44.2 | 25.8 | GPT-4o | 2024-10-25 | |
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics | 33.5 | 43.8 | 30.8 | 33.8 | 25.8 | RoboPoint | 2024-06-15 | |
SpatialBot: Precise Spatial Understanding with Vision Language Models | ✓ Link | 32.7 | 50.9 | 21.6 | 39.6 | 22.9 | SpatialBot | 2024-06-19 |
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | 28.9 | 33.6 | 29.2 | 27.2 | 25.0 | SpaceMantis | 2024-01-22 | |
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | 28.2 | 32.4 | 30.5 | 30.9 | 24.9 | SpaceLLaVA | 2024-01-22 | |
Improved Baselines with Visual Instruction Tuning | ✓ Link | 27.2 | 30.9 | 24.5 | 28.3 | 25.8 | LLaVA-1.5 | 2023-10-05 |