OpenCodePapers

3d-question-answering-3d-qa-on-scanqa-test-w

Visual Question Answering (VQA)3D Question Answering (3D-QA)
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeExact MatchBLEU-1BLEU-4ROUGEMETEORCIDErModelNameReleaseDate
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA✓ Link31.2934.4924.0643.2616.5183.75BridgeQA2024-02-24
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness30.616.449.620.8103.1LLaVA-3D2024-09-26
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding✓ Link30.1102.1Video-3D LLM2024-11-30
Scene-LLM: Extending Language Model for 3D Visual Understanding and Reasoning27.212.040.016.680Scene-LLM2024-03-18
Towards Learning a Generalist Model for Embodied Navigation✓ Link26.2739.7313.9040.2316.5680.77NaviLLM2023-12-04
An Embodied Generalist Agent in 3D World✓ Link24.513.249.220.0101.4LEO2023-11-18
ScanQA: 3D Question Answering for Spatial Scene Understanding✓ Link23.4531.5612.0434.3413.5567.29ScanQA2021-12-20
3D-LLM: Injecting the 3D World into Large Language Models✓ Link23.232.68.434.813.565.63D-LLM (flamingo)2023-07-24
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment✓ Link22.4-10.435.713.969.63D-VisTA2023-08-08
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers✓ Link21.614.341.618.087.7ChatScene2023-12-13
ScanQA: 3D Question Answering for Spatial Scene Understanding✓ Link20.5627.857.4630.6811.9757.56ScanRefer+MCAN2021-12-20
ScanQA: 3D Question Answering for Spatial Scene Understanding✓ Link19.7129.466.0830.9712.0758.23VoteNet+MCAN2021-12-20
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link19.29.628.29.549.2VideoChat22023-11-28
3D-LLM: Injecting the 3D World into Large Language Models✓ Link19.138.311.635.314.969.63D-LLM (BLIP2-flant5)2023-07-24
3D-LLM: Injecting the 3D World into Large Language Models✓ Link19.137.310.734.514.367.13D-LLM (BLIP2-opt)2023-07-24
LLaVA-OneVision: Easy Visual Task Transfer✓ Link18.79.827.89.146.2LLaVA-NeXT-Video2024-08-06
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers✓ Link--14.0--87.6Chat-3D v22023-12-13
Visual Instruction Tuning✓ Link--13.537.315.976.8LL3DA2023-04-17