OpenCodePapers

3d-question-answering-3d-qa-on-scanqa-test-w

Visual Question Answering (VQA)3D Question Answering (3D-QA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Exact Match	BLEU-1	BLEU-4	ROUGE	METEOR	CIDEr	ModelName	ReleaseDate
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA	✓ Link	31.29	34.49	24.06	43.26	16.51	83.75	BridgeQA	2024-02-24
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness		30.6		16.4	49.6	20.8	103.1	LLaVA-3D	2024-09-26
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding	✓ Link	30.1					102.1	Video-3D LLM	2024-11-30
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning		27.2		12.0	40.0	16.6	80	Scene-LLM	2024-03-18
Towards Learning a Generalist Model for Embodied Navigation	✓ Link	26.27	39.73	13.90	40.23	16.56	80.77	NaviLLM	2023-12-04
An Embodied Generalist Agent in 3D World	✓ Link	24.5		13.2	49.2	20.0	101.4	LEO	2023-11-18
ScanQA: 3D Question Answering for Spatial Scene Understanding	✓ Link	23.45	31.56	12.04	34.34	13.55	67.29	ScanQA	2021-12-20
3D-LLM: Injecting the 3D World into Large Language Models	✓ Link	23.2	32.6	8.4	34.8	13.5	65.6	3D-LLM (flamingo)	2023-07-24
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	✓ Link	22.4	-	10.4	35.7	13.9	69.6	3D-VisTA	2023-08-08
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	✓ Link	21.6		14.3	41.6	18.0	87.7	ChatScene	2023-12-13
ScanQA: 3D Question Answering for Spatial Scene Understanding	✓ Link	20.56	27.85	7.46	30.68	11.97	57.56	ScanRefer+MCAN	2021-12-20
ScanQA: 3D Question Answering for Spatial Scene Understanding	✓ Link	19.71	29.46	6.08	30.97	12.07	58.23	VoteNet+MCAN	2021-12-20
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	✓ Link	19.2		9.6	28.2	9.5	49.2	VideoChat2	2023-11-28
3D-LLM: Injecting the 3D World into Large Language Models	✓ Link	19.1	38.3	11.6	35.3	14.9	69.6	3D-LLM (BLIP2-flant5)	2023-07-24
3D-LLM: Injecting the 3D World into Large Language Models	✓ Link	19.1	37.3	10.7	34.5	14.3	67.1	3D-LLM (BLIP2-opt)	2023-07-24
LLaVA-OneVision: Easy Visual Task Transfer	✓ Link	18.7		9.8	27.8	9.1	46.2	LLaVA-NeXT-Video	2024-08-06
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers	✓ Link	-	-	14.0	-	-	87.6	Chat-3D v2	2023-12-13
Visual Instruction Tuning	✓ Link	-	-	13.5	37.3	15.9	76.8	LL3DA	2023-04-17