Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism | ✓ Link | 42.9 | 64.1 | 44.0 | 10.6 | RDFA-S6 (InternVideo2-6B) | 2024-07-18 |
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | ✓ Link | 42.02 | 62.43 | 43.49 | 10.23 | ActionMamba (InternVideo2-6B) | 2024-03-14 |
Proposal Relation Network for Temporal Action Detection | ✓ Link | 42.0 | 59.7 | | | PRN+BMN (ensemble) | 2021-06-20 |
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames | ✓ Link | 41.93 | 61.72 | 43.35 | 10.85 | AdaTAD (VideoMAEv2-giant) | 2023-11-28 |
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | ✓ Link | 41.2 | | | | InternVideo2-6B | 2024-03-22 |
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | ✓ Link | 40.4 | | | | InternVideo2-1B | 2024-03-22 |
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | ✓ Link | 39.83 | 60.29 | | | UniMD+Sync. | 2024-04-07 |
Proposal Relation Network for Temporal Action Detection | ✓ Link | 39.4 | 57.9 | | | PRN (CSN) | 2021-06-20 |
InternVideo: General Video Foundation Models via Generative and Discriminative Learning | ✓ Link | 39.00 | | | | InternVideo | 2022-12-06 |
Temporal Context Aggregation Network for Temporal Action Proposal Refinement | ✓ Link | 37.56 | 54.33 | 39.13 | 8.41 | TCANet (SlowFast R101) | 2021-03-24 |
Proposal Relation Network for Temporal Action Detection | ✓ Link | 37.5 | 55.5 | | | PRN (ViViT) | 2021-06-20 |
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization | ✓ Link | 36.82 | 54.34 | 37.66 | 8.93 | AVFusion | 2021-06-27 |
TriDet: Temporal Action Detection with Relative Boundary Modeling | ✓ Link | 36.8 | 54.7 | 38.0 | 8.4 | TriDet (TSP features) | 2023-03-13 |
End-to-end Temporal Action Detection with Transformer | ✓ Link | 36.75 | 53.62 | 37.52 | 10.56 | TadTR (TSP features) | 2021-06-18 |
ActionFormer: Localizing Moments of Actions with Transformers | ✓ Link | 36.6 | 54.7 | 37.8 | 8.4 | ActionFormer (TSP feautures) | 2022-02-16 |
Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning | ✓ Link | 36.5 | | | | TAGS (I3D) | 2022-07-14 |
Video Self-Stitching Graph Network for Temporal Action Localization | ✓ Link | 35.94 | 53.26 | 36.76 | 8.12 | VSGN (TSP features) | 2020-11-30 |
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks | ✓ Link | 35.81 | 51.26 | 37.12 | 9.29 | TSP | 2020-11-23 |
Improve Temporal Action Proposals using Hierarchical Context | | 35.61 | 52.51 | 36.10 | 7.12 | HCN(I3D features) | 2023-04-03 |
DCAN: Improving Temporal Action Detection via Dual Context Aggregation | ✓ Link | 35.39 | 51.78 | 35.98 | 9.45 | DCAN (TSN features) | 2021-12-07 |
An Empirical Study of End-to-End Temporal Action Detection | ✓ Link | 35.10 | 50.47 | 35.99 | 10.83 | E2E-TAD (SlowFast R50+TadTR) | 2022-04-06 |
Low-Fidelity Video Encoder Optimization for Temporal Action Localization | | 34.96 | 50.91 | 35.86 | 8.79 | LoFi+G-TAD | 2021-12-01 |
BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation | ✓ Link | 34.88 | 51.27 | 35.70 | 8.33 | BSN++ | 2020-09-15 |
Boundary-sensitive Pre-training for Temporal Localization in Videos | ✓ Link | 34.75 | 50.94 | 35.61 | 7.98 | G-TAD+BSP | 2020-11-21 |
Self-Supervised Learning for Semi-Supervised Temporal Action Proposal | ✓ Link | 34.48 | 50.72 | 35.28 | 7.87 | SSTAP@100%+ | 2021-04-07 |
Boundary Content Graph Neural Network for Temporal Action Proposal Generation | | 34.26 | 50.56 | 34.75 | 9.37 | BC-GNN | 2020-08-04 |
Graph Convolutional Module for Temporal Action Localization in Videos | | 34.24 | 51.03 | 35.17 | 7.44 | GCM | 2021-12-01 |
G-TAD: Sub-Graph Localization for Temporal Action Detection | ✓ Link | 34.09 | 50.36 | 34.60 | 9.02 | G-TAD | 2019-11-26 |
BMN: Boundary-Matching Network for Temporal Action Proposal Generation | ✓ Link | 33.85 | 50.07 | 34.78 | 8.29 | BMN | 2019-07-23 |
A Pursuit of Temporal Accuracy in General Activity Detection | ✓ Link | 32.26 | 39.12 | | | SSN | 2017-03-08 |
Graph Convolutional Networks for Temporal Action Localization | ✓ Link | 31.11 | 48.26 | 33.16 | 3.27 | P-GCN | 2019-09-07 |
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation | ✓ Link | 30.03 | 46.45 | 29.96 | 8.02 | BSN | 2018-06-08 |
UnLoc: A Unified Framework for Video Localization Tasks | ✓ Link | | 59.3 | | | UnLoc-L | 2023-08-21 |