OpenCodePapers
video-based-generative-performance-1
Video-based Generative Performance Benchmarking
Video-based Generative Performance Benchmarking (Correctness of Information)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
gpt-score
↕
ModelName
ReleaseDate
↕
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
✓ Link
3.85
PPLLaVA-7B
2024-11-04
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
✓ Link
3.60
PLLaVA-34B
2024-04-25
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
✓ Link
3.55
TS-LLaVA-34B
2024-11-17
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
✓ Link
3.48
SlowFast-LLaVA-34B
2024-07-22
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
3.40
VideoChat2_HD_mistral
2023-11-28
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
✓ Link
3.27
VideoGPT+
2024-06-13
ST-LLM: Large Language Models Are Effective Temporal Learners
✓ Link
3.23
ST-LLM
2024-03-30
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
✓ Link
3.08
MiniGPT4-video-7B
2024-04-04
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
✓ Link
3.02
VideoChat2
2023-11-28
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
✓ Link
2.89
Chat-UniVi
2023-11-14
VTimeLLM: Empower LLM to Grasp Video Moments
✓ Link
2.78
VTimeLLM
2023-11-30
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
✓ Link
2.76
MovieChat
2023-07-31
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
✓ Link
2.68
BT-Adapter
2023-09-27
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
✓ Link
2.40
Video-ChatGPT
2023-06-08
VideoChat: Chat-Centric Video Understanding
✓ Link
2.32
Video Chat
2023-05-10
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
✓ Link
2.16
BT-Adapter (zero-shot)
2023-09-27
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
✓ Link
2.03
LLaMA Adapter
2023-04-28
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
✓ Link
1.96
Video LLaMA
2023-06-05