OpenCodePapers
3d-question-answering-3d-qa-on-sqa3d
Visual Question Answering (VQA)
3D Question Answering (3D-QA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Exact Match
↕
ModelName
ReleaseDate
↕
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness
60.1
LLaVA-3D
2024-09-26
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
✓ Link
58.6
Video-3D LLM
2024-11-30
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
✓ Link
54.7
Chat-3D v2
2023-12-13
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
✓ Link
54.6
ChatScene
2023-12-13
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning
54.2
Scene-LLM
2024-03-18
An Embodied Generalist Agent in 3D World
✓ Link
50.0
LEO
2023-11-18
Video Instruction Tuning With Synthetic Data
48.5
LLaVA-Video
2024-10-03
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
✓ Link
48.5
3D-VisTA
2023-08-08
ScanQA: 3D Question Answering for Spatial Scene Understanding
✓ Link
47.2
ScanQA
2021-12-20
Unifying 3D Vision-Language Understanding via Promptable Queries
47.1
PQ3D
2024-05-19
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
41.0
Scan2Cap
2020-12-03
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
37.3
VideoChat2
2023-11-28
LLaVA-OneVision: Easy Visual Task Transfer
✓ Link
34.2
LLaVA-NeXT-Video
2024-08-06