OpenCodePapers

vcgbench-diverse-on-videoinstruct

VCGBench-Diverse

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mean	Correctness of Information	Detail Orientation	Contextual Understanding	Temporal Understanding	Consistency	Dense Captioning	Spatial Understanding	Reasoning	ModelName	ReleaseDate
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	✓ Link	2.47	2.46	2.73	2.81	1.78	2.59	1.38	2.80	3.63	VideoGPT+	2024-06-13
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	✓ Link	2.29	2.29	2.56	2.66	1.56	2.36	1.33	2.36	3.59	Chat-UniVi	2023-11-14
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	✓ Link	2.20	2.13	2.42	2.51	1.66	2.27	1.26	2.43	3.13	VideoChat2	2023-11-28
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	✓ Link	2.19	2.20	2.62	2.59	1.29	2.27	1.03	2.35	3.62	BT-Adapter	2023-09-27
VTimeLLM: Empower LLM to Grasp Video Moments	✓ Link	2.17	2.16	2.41	2.48	1.46	2.35	1.13	2.29	3.45	VTimeLLM	2023-11-30
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	✓ Link	2.08	2.07	2.42	2.46	1.39	2.06	0.89	2.25	3.60	Video-ChatGPT	2023-06-08