OpenCodePapers

video-question-answering-on-perception-test

Video Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (Top-1)ModelNameReleaseDate
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution✓ Link71.4Oyrx (34B)2024-09-19
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering✓ Link68.51BIMBA-LLaVA-Qwen2-7B2025-03-12
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link63.4InternVideo2 (8B)2024-03-22
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs✓ Link57.5VideoLLaMA2 (72B)2024-06-11
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering✓ Link50.2TraveLER2024-04-01
Perception Test: A Diagnostic Benchmark for Multimodal Video Models✓ Link0.46Flamingo2023-05-23