Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval | ✓ Link | 40.2 | Text + Text (no Multimodal Pretext Training) | 2022-06-05 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | ✓ Link | 39.6 | FrozenBiLM | 2022-06-16 |
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | 39.0 | VideoCoCa | 2022-12-09 | |
Video Question Answering with Iterative Video-Text Co-Tokenization | 38.2 | Co-Tokenization | 2022-08-01 | |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos | ✓ Link | 35.4 | Just Ask (fine-tune) | 2020-12-01 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | ✓ Link | 26.8 | FrozenBiLM (0-shot) | 2022-06-16 |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos | ✓ Link | 12.2 | Just Ask (0-shot) | 2020-12-01 |