Paper | Code | Accuracy | Confidence Score | ModelName | ReleaseDate |
---|---|---|---|---|---|
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | ✓ Link | 61.6 | 3.4 | Flash-VStream | 2024-06-12 |
Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens | 60.7 | 3.4 | Vista-LLaMA | 2023-12-12 | |
VideoChat: Chat-Centric Video Understanding | ✓ Link | 56.6 | 3.2 | VideoChat | 2023-05-10 |
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | ✓ Link | 54.8 | 3.0 | MovieChat+ | 2024-04-26 |
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models | ✓ Link | 54.6 | 3.2 | Video-ChatGPT | 2023-06-08 |
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | ✓ Link | 49.9 | 2.7 | MovieChat | 2023-07-31 |