Paper | Code | 2-Class Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding | ✓ Link | 88.33 | Video-LLAMA | 2023-06-05 |
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding | ✓ Link | 76.67 | Time-Chat | 2023-12-04 |
Test of Time: Instilling Video-Language Models with a Sense of Time | ✓ Link | 64.4 | TACT | 2023-01-05 |
Videoprompter: an ensemble of foundational models for zero-shot video understanding | 60.0 | VideoPrompter | 2023-10-23 |