OpenCodePapers

zeroshot-video-question-answer-on-tgif-qa

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyConfidence ScoreModelNameReleaseDate
Tarsier: Recipes for Training and Evaluating Large Video Description Models✓ Link82.54.4Tarsier (34B)2024-06-30
LinVT: Empower Your Image-level Large Language Model to Understand Videos✓ Link81.34.3LinVT-Qwen2-VL (7B)2024-12-06
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models✓ Link81.04.2TS-LLaVA-34B2024-11-17
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning✓ Link80.64.3PLLaVA2024-04-25
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models✓ Link80.64.3SlowFast-LLaVA-34B2024-07-22
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM✓ Link79.14.2IG-VLM2024-03-27
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding✓ Link74.64.1VideoGPT+2024-06-13
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens✓ Link72.22MiniGPT4-video-7B2024-04-04
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection✓ Link70.04.0Video-LLaVA-7B2023-11-16
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding✓ Link69.03.8Chat-UniVi-7B2023-11-14
Elysium: Exploring Object-level Perception in Videos via MLLM✓ Link66.63.6Elysium2024-03-25
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models✓ Link51.43.0Video-ChatGPT-7B2023-06-08
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models✓ Link41.9FrozenBiLM2022-06-16
VideoChat: Chat-Centric Video Understanding✓ Link34.42.3Video Chat-7B2023-05-10