zero-shot-video-question-answer-on-zero-shot

Video Question AnsweringZero-Shot Video Question Answer

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy (% )	ModelName	ReleaseDate
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	✓ Link	66.7	Gemini 1.5 Pro	2024-03-08
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	✓ Link	65.4	Video-RAG (based on LLaVA-Video)	2024-11-20
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding		64.0	GPT-4o	2024-06-14
Video Instruction Tuning With Synthetic Data		61.9	LLaVA-Video	2024-10-03

OpenCodePapers