OpenCodePapers

vcgbench-diverse-on-videoinstruct

VCGBench-Diverse
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemeanCorrectness of InformationDetail OrientationContextual UnderstandingTemporal UnderstandingConsistencyDense CaptioningSpatial UnderstandingReasoningModelNameReleaseDate
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding✓ Link2.472.462.732.811.782.591.382.803.63VideoGPT+2024-06-13
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding✓ Link2.292.292.562.661.562.361.332.363.59Chat-UniVi2023-11-14
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark✓ Link2.202.132.422.511.662.271.262.433.13VideoChat22023-11-28
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning✓ Link2.192.202.622.591.292.271.032.353.62BT-Adapter2023-09-27
VTimeLLM: Empower LLM to Grasp Video Moments✓ Link2.172.162.412.481.462.351.132.293.45VTimeLLM2023-11-30
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models✓ Link2.082.072.422.461.392.060.892.253.60Video-ChatGPT2023-06-08