End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames | ✓ Link | 76.9 | | | 89.7 | 86.7 | 80.9 | 71.0 | 56.1 | AdaTAD (VideoMAEv2-giant) | 2023-11-28 |
Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism | ✓ Link | 74.2 | | | 88.7 | 84.6 | 78.2 | 66.6 | 51.9 | RDFA-S6 (InternVideo2-6B) | 2024-07-18 |
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | ✓ Link | 72.72 | | | 86.89 | 83.09 | 76.90 | 65.91 | 50.82 | ActionMamba(InternVideo2-6B) | 2024-03-14 |
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | ✓ Link | 72.0 | | | | | | | | InternVideo2-6B | 2024-03-22 |
InternVideo: General Video Foundation Models via Generative and Discriminative Learning | ✓ Link | 71.58 | | | | | | | | ActionFormer (InternVideo features) | 2022-12-06 |
Temporal Action Localization with Enhanced Instant Discriminability | ✓ Link | 70.1 | | | 84.8 | 80.0 | 73.3 | 63.8 | 48.8 | TriDet (VideoMAE v2-g feature) | 2023-09-11 |
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | ✓ Link | 69.8 | | | | | | | | InternVideo2-1B | 2024-03-22 |
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking | ✓ Link | 69.6 | | | 84.0 | 79.6 | 73.0 | 63.5 | 47.7 | ActionFormer (VideoMAE V2-g features) | 2023-03-29 |
TriDet: Temporal Action Detection with Relative Boundary Modeling | ✓ Link | 69.3 | | | 83.6 | 80.1 | 72.9 | 62.4 | 47.4 | TriDet (I3D features) | 2023-03-13 |
Action Sensitivity Learning for Temporal Action Localization | | 67.9 | | | 83.1 | 79.0 | 71.7 | 59.7 | 45.8 | ASL(I3D features) | 2023-05-25 |
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization | ✓ Link | 67.7 | | | 82.8 | 78.9 | 71.8 | 60.5 | 44.7 | TemporalMaxer (I3D features) | 2023-03-16 |
Dual DETRs for Multi-Label Temporal Action Detection | | 66.8 | | | 82.9 | 78.0 | 70.4 | 58.5 | 44.4 | DualDETR (I3D features) | 2024-03-31 |
ActionFormer: Localizing Moments of Actions with Transformers | ✓ Link | 66.8 | | | 82.1 | 77.8 | 71.0 | 59.4 | 43.9 | ActionFormer (I3D features) | 2022-02-16 |
TadML: A fast temporal action detection with Mechanics-MLP | ✓ Link | 59.70 | | | 73.29 | 69.73 | 62.53 | 53.36 | 39.60 | TadML(two-stream) | 2022-06-07 |
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection | ✓ Link | 59.6 | | | 75.5 | 70.8 | 63.5 | 50.9 | 37.4 | BasicTAD (160,6,192,R50-SlowOnly) | 2022-05-05 |
End-to-end Temporal Action Detection with Transformer | ✓ Link | 56.7 | | | 74.8 | 69.1 | 60.1 | 46.6 | 32.8 | TadTR | 2021-06-18 |
ReAct: Temporal Action Detection with Relational Queries | ✓ Link | 55.0 | | | 69.2 | 65.0 | 57.1 | 47.8 | 35.6 | ReAct (TSN features) | 2022-07-14 |
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection | ✓ Link | 54.9 | | | 68.4 | 65.0 | 58.6 | 49.2 | 33.5 | BasicTAD (112,3,96,R50-SlowOnly) | 2022-05-05 |
An Empirical Study of End-to-End Temporal Action Detection | ✓ Link | 54.2 | | | 69.4 | 64.3 | 56.0 | 46.4 | 34.9 | E2E-TAD (SlowFast R50+TadTR) | 2022-04-06 |
TadML: A fast temporal action detection with Mechanics-MLP | ✓ Link | 53.46 | | | 68.78 | 64.66 | 56.61 | 45.40 | 31.88 | TadML(rgb-only) | 2022-06-07 |
Multi-shot Temporal Event Localization: a Benchmark | ✓ Link | 53.4 | | | 68.9 | 64.0 | 56.9 | 46.3 | 31.0 | MUSES | 2020-12-17 |
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization | ✓ Link | 53.3 | | | 70.1 | 64.9 | 57.1 | 45.4 | 28.8 | AVFusion | 2021-06-27 |
Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning | ✓ Link | 52.8 | | | 68.6 | 63.8 | 57.0 | 46.3 | 31.8 | TAGS (I3D) | 2022-07-14 |
DCAN: Improving Temporal Action Detection via Dual Context Aggregation | ✓ Link | 52.3 | | | 68.2 | 62.7 | 54.1 | 43.9 | 32.6 | DCAN (TSN features) | 2021-12-07 |
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks | ✓ Link | 50.46 | 74.02 | 72.29 | 69.1 | 63.3 | 53.5 | 40.4 | 26 | TSP | 2020-11-23 |
Video Self-Stitching Graph Network for Temporal Action Localization | ✓ Link | 50.2 | | | 66.7 | 60.4 | 52.4 | 41.0 | 30.4 | VSGN | 2020-11-30 |
RGB Stream Is Enough for Temporal Action Detection | ✓ Link | 50.0 | | | 62.8 | 59.5 | 53.8 | 43.6 | 30.1 | DaoTAD | 2021-07-09 |
Decoupling Localization and Classification in Single Shot Temporal Action Detection | ✓ Link | 42.0 | | | 60.2 | 54.1 | 44.2 | 32.3 | 19.1 | Decouple-SSAD | 2019-04-16 |
Rethinking the Faster R-CNN Architecture for Temporal Action Localization | | 39.8 | 59.8 | 57.1 | 53.2 | 48.5 | 42.8 | 33.8 | 20.8 | TAL-Net | 2018-04-20 |
Graph Convolutional Module for Temporal Action Localization in Videos | | | 72.5 | 70.9 | 66.5 | 60.8 | 51.9 | | | GCM | 2021-12-01 |
Activity Graph Transformer for Temporal Action Localization | | | 72.1 | 69.8 | 65 | 58.1 | 50.2 | | | AGT (Ours) | 2021-01-21 |
Graph Convolutional Networks for Temporal Action Localization | ✓ Link | | 69.5 | 67.8 | 63.6 | 57.8 | 49.1 | | | P-GCN | 2019-09-07 |
Weakly Supervised Temporal Action Localization Using Deep Metric Learning | ✓ Link | | 62.3 | | 46.8 | | 29.6 | | 9.7 | DeepMetricLearner | 2020-01-21 |
Cascaded Boundary Regression for Temporal Action Detection | | | 60.1 | 56.7 | 50.1 | 41.3 | 31 | 19.1 | 9.9 | CBR-TS | 2017-05-02 |
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection | ✓ Link | | 54.5 | 51.5 | 44.8 | 35.6 | 28.9 | | | R-C3D | 2017-03-22 |
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals | ✓ Link | | 54 | 50.9 | 44.1 | 34.9 | 25.6 | | | TURN-FL-16 + S-CNN | 2017-03-17 |
End-to-end Learning of Action Detection from Frame Glimpses in Videos | ✓ Link | | 48.9 | 44.0 | 36.0 | 26.4 | 17.1 | | | Yeung et al. | 2015-11-22 |
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs | ✓ Link | | 47.7 | 43.5 | 36.3 | 28.7 | 19 | | | S-CNN | 2016-01-09 |
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation | ✓ Link | | | | 53.5 | 45 | 36.9 | 28.4 | 20 | BSN UNet | 2018-06-08 |
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos | ✓ Link | | | | 40.1 | 29.4 | 23.3 | 13.1 | 7.9 | CDC | 2017-03-04 |
G-TAD: Sub-Graph Localization for Temporal Action Detection | ✓ Link | | | | | | 40.2 | | | G-TAD | 2019-11-26 |
BMN: Boundary-Matching Network for Temporal Action Proposal Generation | ✓ Link | | | | | | 32.2 | | | BMN | 2019-07-23 |