Paper | Code | Cap. Avg. R@1 | Cap. Avg. R@5 | Cap. Avg. R@10 | DTW R@1 | DTW R@5 | DTW R@10 | OTAM R@1 | OTAM R@5 | OTAM R@10 | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Multi-granularity Correspondence Learning from Long-term Noisy Videos | ✓ Link | 75.5 | 95.0 | 97.7 | 88.7 | 98.8 | 99.5 | 88.9 | 98.4 | 99.5 | Norton | 2024-01-30 |
TempCLR: Temporal Alignment Representation with Contrastive Learning | ✓ Link | 74.5 | 94.6 | 97.0 | 83.5 | 97.2 | 99.3 | 84.9 | 97.9 | 99.5 | TempCLR | 2022-12-28 |
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding | ✓ Link | 74.5 | 94.5 | 97.9 | 56.0 | 96.3 | 89.9 | 52.8 | 95.0 | 89.2 | VideoCLIP | 2021-09-28 |
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos | ✓ Link | 53.4 | 75.0 | 81.4 | MCN | 2021-04-26 | ||||||
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips | ✓ Link | 46.6 | 74.3 | 83.7 | Text-Video Embedding | 2019-06-07 | ||||||
End-to-End Learning of Visual Representations from Uncurated Instructional Videos | ✓ Link | 43.1 | 68.6 | 79.1 | MIL-NCE | 2019-12-13 |