OpenCodePapers

3d-question-answering-3d-qa-on-sqa3d

Visual Question Answering (VQA)3D Question Answering (3D-QA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeExact MatchModelNameReleaseDate
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness60.1LLaVA-3D2024-09-26
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding✓ Link58.6Video-3D LLM2024-11-30
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers✓ Link54.7Chat-3D v22023-12-13
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers✓ Link54.6ChatScene2023-12-13
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning54.2Scene-LLM2024-03-18
An Embodied Generalist Agent in 3D World✓ Link50.0LEO2023-11-18
Video Instruction Tuning With Synthetic Data48.5LLaVA-Video2024-10-03
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment✓ Link48.53D-VisTA2023-08-08
ScanQA: 3D Question Answering for Spatial Scene Understanding✓ Link47.2ScanQA2021-12-20
Unifying 3D Vision-Language Understanding via Promptable Queries47.1PQ3D2024-05-19
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans41.0Scan2Cap2020-12-03
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link37.3VideoChat22023-11-28
LLaVA-OneVision: Easy Visual Task Transfer✓ Link34.2LLaVA-NeXT-Video2024-08-06