Paper | Code | Frame accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
UnLoc: A Unified Framework for Video Localization Tasks | ✓ Link | 72.8 | UnLoc-L | 2023-08-21 |
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation | ✓ Link | 70.0 | Univl | 2020-02-15 |
Multi-granularity Correspondence Learning from Long-term Noisy Videos | ✓ Link | 69.8 | Norton | 2024-01-30 |
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding | ✓ Link | 68.7 | VideoClip | 2021-09-28 |
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding | ✓ Link | 68.4 | VLM | 2021-05-20 |
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment | 68.4 | TACo | 2021-08-23 | |
End-to-End Learning of Visual Representations from Uncurated Instructional Videos | ✓ Link | 61.0 | MIL-NCE | 2019-12-13 |
ActBERT: Learning Global-Local Video-Text Representations | ✓ Link | 57.0 | ActBERT | 2020-11-14 |
End-to-End Learning of Visual Representations from Uncurated Instructional Videos | ✓ Link | 53.9 | CBT | 2019-12-13 |