Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | ✓ Link | 59.0 | VideoChat2 | 2023-11-28 |
VidCtx: Context-aware Video Question Answering with Image Models | ✓ Link | 51.1 | VidCtx (7B) | 2024-12-23 |
Flamingo: a Visual Language Model for Few-Shot Learning | ✓ Link | 41.8 | Flamingo-9B | 2022-04-29 |
InternVideo: General Video Foundation Models via Generative and Discriminative Learning | ✓ Link | 41.6 | InternVideo | 2022-12-06 |