Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval | ✓ Link | 93.2 | Text + Text (no Multimodal Pretext Training) | 2022-06-05 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | ✓ Link | 86.7 | FrozenBiLM | 2022-06-16 |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos | ✓ Link | 84.4 | Just Ask | 2020-12-01 |
[]() | 83.7 | SeViLA | ||
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | ✓ Link | 77.75 | Hero w/ pre-training | 2020-05-01 |
Revisiting the "Video" in Video-Language Understanding | ✓ Link | 65.1 | ATP | 2022-06-03 |
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models | ✓ Link | 58.4 | FrozenBiLM (0-shot) | 2022-06-16 |
Just Ask: Learning to Answer Questions from Millions of Narrated Videos | ✓ Link | 51.1 | Just Ask (0-shot) | 2020-12-01 |