OpenCodePapers

zero-shot-video-question-answer-on-intentqa

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
ENTER: Event Based Interpretable Reasoning for VideoQA71.5ENTER2025-01-24
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA✓ Link71.1LVNet2024-06-13
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models✓ Link67.9TS-LLaVA-34B2024-11-17
VidCtx: Context-aware Video Question Answering with Image Models✓ Link67.1VidCtx (7B)2024-12-23
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos✓ Link66.9VideoTree (GPT4)2024-05-29
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM✓ Link65.3IG-VLM2024-03-27
A Simple LLM Framework for Long-Range Video Question-Answering✓ Link64.0LLoVi (GPT-4)2023-12-28
Self-Chained Image-Language Model for Video Localization and Question Answering✓ Link60.9SeViLA (4B)2023-05-11
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models✓ Link60.1SlowFast-LLaVA-34B2024-07-22
Language Repository for Long Video Understanding✓ Link59.1LangRepo (12B)2024-03-21
A Simple LLM Framework for Long-Range Video Question-Answering✓ Link53.6LLoVi (7B)2023-12-28
Mistral 7B✓ Link50.4Mistral (7B)2023-10-10
[]()20.0Random