OpenCodePapers

zero-shot-video-question-answer-on-tvqa

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models✓ Link59.7FrozenBiLM (with speech)2022-06-16
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM✓ Link57.8IG-VLM (no speech, GPT-4V)2024-03-27
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens✓ Link54.21MiniGPT4-video-7B2024-04-04
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link50.6VideoChat_HD_mistral (no speech)2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link46.4VideoChat_mistral (no speech)2023-11-28
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link40.6VideoChat2 (no speech)2023-11-28
Self-Chained Image-Language Model for Video Localization and Question Answering✓ Link38.2SEVILA (no speech)2023-05-11
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link35.9InternVideo (no speech)2022-12-06
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models✓ Link29.7FrozenBILM (no speech)2022-06-16