OpenCodePapers

3d-question-answering-3d-qa-on-sqa3d

Visual Question Answering (VQA)3D Question Answering (3D-QA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Exact Match	ModelName	ReleaseDate
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness		60.1	LLaVA-3D	2024-09-26
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	✓ Link	58.6	Video-3D LLM	2024-11-30
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	✓ Link	54.7	Chat-3D v2	2023-12-13
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	✓ Link	54.6	ChatScene	2023-12-13
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning		54.2	Scene-LLM	2024-03-18
An Embodied Generalist Agent in 3D World	✓ Link	50.0	LEO	2023-11-18
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	✓ Link	48.5	3D-VisTA	2023-08-08
Video Instruction Tuning With Synthetic Data		48.5	LLaVA-Video	2024-10-03
ScanQA: 3D Question Answering for Spatial Scene Understanding	✓ Link	47.2	ScanQA	2021-12-20
Unifying 3D Vision-Language Understanding via Promptable Queries		47.1	PQ3D	2024-05-19
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans		41.0	Scan2Cap	2020-12-03
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	✓ Link	37.3	VideoChat2	2023-11-28
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	34.2	LLaVA-NeXT-Video	2024-08-06