Paper | Code | R@1,IoU=0.5 | R@1,IoU=0.7 | R@5,IoU=0.5 | R@5,IoU=0.7 | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | ✓ Link | 60.67 | 38.55 | GVL (paragraph-level) | 2023-03-11 | ||
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval | ✓ Link | 55.16 | 35.68 | LLaVA-MR | 2024-11-21 | ||
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | ✓ Link | 49.18 | 29.69 | GVL | 2023-03-11 | ||
UnLoc: A Unified Framework for Video Localization Tasks | ✓ Link | 48.3 | 30.2 | 79.2 | 61.3 | UnLoc-L | 2023-08-21 |
UnLoc: A Unified Framework for Video Localization Tasks | ✓ Link | 48.0 | 29.7 | 81.5 | 61.4 | UnLoc-B | 2023-08-21 |
VLG-Net: Video-Language Graph Matching Network for Video Grounding | ✓ Link | 46.32 | 29.82 | 77.15 | 63.33 | VLG-Net | 2020-11-19 |
Dense Regression Network for Video Grounding | ✓ Link | 45.45 | 24.36 | 77.97 | 50.30 | DRN | 2020-04-07 |
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | ✓ Link | 80.54 | 57.04 | UniMD+Sync. | 2024-04-07 |