Paper | Code | Accuarcy | CW | CH | TP&TN | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | ✓ Link | 83.4 | 84.0 | 90.0 | 77.3 | VideoChat2_HD_mistral | 2023-11-28 |
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | ✓ Link | 81.9 | 82.6 | 86.9 | 77.0 | VideoChat2_mistral | 2023-11-28 |
IntentQA: Context-aware Video Intent Reasoning | ✓ Link | 78.5 | 77.8 | 80.2 | 79.1 | Human | 2023-01-01 |
IntentQA: Context-aware Video Intent Reasoning | ✓ Link | 57.6 | 58.4 | 65.5 | 50.5 | IntentQA | 2023-01-01 |
Video Graph Transformer for Video Question Answering | ✓ Link | 51.3 | 51.4 | 56.0 | 47.6 | VGT | 2022-07-12 |
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering | ✓ Link | 47.7 | 48.2 | 54.3 | 41.7 | HQGA | 2021-12-12 |