OpenCodePapers

zero-shot-video-question-answer-on-zero-shot

Video Question AnsweringZero-Shot Video Question Answer
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy (% )ModelNameReleaseDate
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context✓ Link66.7Gemini 1.5 Pro2024-03-08
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension✓ Link65.4Video-RAG (based on LLaVA-Video)2024-11-20
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding64.0GPT-4o2024-06-14
Video Instruction Tuning With Synthetic Data61.9LLaVA-Video2024-10-03