OpenCodePapers

zero-shot-video-question-answer-on-egoschema

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyInference Speed (s)ModelNameReleaseDate
Tarsier: Recipes for Training and Evaluating Large Video Description Models✓ Link68.6Tarsier (34B)2024-06-30
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning✓ Link68.4VideoChat-T (7B)2024-10-25
Language Repository for Long Video Understanding✓ Link66.2LangRepo (12B)2024-03-21
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos✓ Link66.2VideoTree (GPT4)2024-05-29
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA✓ Link66.0LVNet2024-06-13
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link65.6VideoChat2_HD_mistral2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link63.6VideoChat2_mistral2023-11-28
Understanding Long Videos with Multimodal Language Models✓ Link60.32.42MVU (13B)2024-03-25
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models✓ Link57.8TS-LLaVA-34B2024-11-17
A Simple LLM Framework for Long-Range Video Question-Answering✓ Link57.6LLoVi (GPT-3.5)2023-12-28
A Simple LLM Framework for Long-Range Video Question-Answering✓ Link50.8LLoVi (7B)2023-12-28
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models✓ Link47.2SlowFast-LLaVA-34B2024-07-22
Self-Chained Image-Language Model for Video Localization and Question Answering✓ Link25.7SeViLA (4B)2023-05-11
[]()20.0Random