Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | | 76.2 | 67.5 | 79.0 | 82.1 | 78.3 | 78.0 | | | AdaFocus (newly extracted I3D-features, LT-Context model) | 2023-11-28 |
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation | ✓ Link | 74.7 | 66.2 | 76.5 | 81.4 | 79.7 | 76.2 | | | FACT (efficient hybrid of convolution and transformer model) | 2024-01-01 |
ASQuery: A Query-based Model for Action Segmentation | ✓ Link | 74.6 | 66.5 | 76.5 | 80.7 | 78.4 | 77.9 | | | ASQuery | 2024-09-30 |
BIT: Bi-Level Temporal Modeling for Efficient Supervised Action Segmentation | | 73.7 | 64.7 | 75.8 | 80.6 | 79.0 | 75.5 | | | BIT | 2023-08-28 |
Diffusion Action Segmentation | ✓ Link | 73.6 | 64.6 | 75.9 | 80.3 | 78.4 | 76.4 | | | DiffAct | 2023-03-31 |
Efficient Temporal Action Segmentation via Boundary-aware Query Voting | ✓ Link | 72.4 | 63.2 | 74.9 | 79.2 | 77.3 | 76.6 | | | BaFormer | 2024-05-25 |
Cross-Enhancement Transformer for Action Segmentation | ✓ Link | 71.8 | 61.9 | 74.3 | 79.3 | 77.8 | 74.9 | | | CETNet | 2022-05-19 |
SF-TMN: SlowFast Temporal Modeling Network for Surgical Phase Recognition | | 71.6 | 62.2 | 74.0 | 78.7 | 77.0 | 77.0 | | | SF-TMN(ASFormer) | 2023-06-15 |
ASPnet: Action Segmentation With Shared-Private Representation of Multiple Data Sources | | 70.6 | 60.8 | 72.9 | 78.1 | 76.3 | 75.9 | | | ASPnet | 2023-01-01 |
How Much Temporal Long-Term Context is Needed for Action Segmentation? | ✓ Link | 70.1 | 60.1 | 72.6 | 77.6 | 77.0 | 74.2 | | | LTContext | 2023-08-22 |
Do we really need temporal convolutions in action segmentation? | ✓ Link | 69.3 | 59.8 | 71.8 | 76.2 | 74.6 | 75 | | | EUT | 2022-05-26 |
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation | ✓ Link | 68.8 | 58 | 71.5 | 76.9 | 77.1 | 69.7 | | | UVAST | 2022-09-01 |
ASFormer: Transformer for Action Segmentation | ✓ Link | 68.0 | 57.4 | 70.6 | 76.0 | 75.0 | 73.5 | | | ASFormer | 2021-10-16 |
Maximization and restoration: Action segmentation through dilation passing and temporal reconstruction | | 67.9 | 57.6 | 70.5 | 75.6 | 75.1 | 71.7 | | | DPRN | 2022-05-02 |
Refining Action Segmentation With Hierarchical Video Representations | ✓ Link | 67.1 | 57.0 | 69.5 | 74.7 | 71.9 | 69.4 | | | ASRF + HASR | 2021-01-01 |
Global2Local: Efficient Structure Search for Video Action Segmentation | ✓ Link | 66.9 | 54.6 | 69.9 | 76.3 | 74.5 | 70.8 | | | G2L(SSTDA) | 2021-01-04 |
FIFA: Fast Inference Approximation for Action Segmentation | | 66.8 | 54.8 | 70.2 | 75.5 | 78.5 | 68.6 | | | FIFA + MS-TCN | 2021-08-09 |
Action Segmentation with Mixed Temporal Domain Adaptation | | 66.4 | 56.5 | 68.6 | 74.2 | 73.6 | 71.0 | | | DA | 2021-04-15 |
Efficient Two-Step Networks for Temporal Action Segmentation | ✓ Link | 66.4 | 56.2 | 69.0 | 74.0 | 70.3 | 67.8 | | | ETSN | 2021-04-30 |
Alleviating Over-segmentation Errors by Detecting Action Boundaries | ✓ Link | 66.4 | 56.1 | 68.9 | 74.3 | 72.4 | 67.6 | | | ASRF | 2020-07-14 |
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation | ✓ Link | 66.4 | 55.2 | 69.1 | 75.0 | 73.7 | 70.2 | | | SSTDA | 2020-03-05 |
Coarse to Fine Multi-Resolution Temporal Convolutional Network | ✓ Link | 66.2 | 57.6 | 68.7 | 72.2 | 69.6 | 76.0 | | | C2F-TCN | 2021-05-23 |
Boundary-Aware Cascade Networks for Temporal Action Segmentation | ✓ Link | 63.1 | 55.0 | 65.5 | 68.7 | 66.2 | 70.4 | | | BCN | |
Fast Weakly Supervised Action Segmentation Using Mutual Consistency | ✓ Link | 62.6 | 48.4 | 66.1 | 73.2 | 76.3 | 62.8 | | | MuCon | 2019-04-05 |
Depthwise Separable Temporal Convolutional Network for Action Segmentation | | 59.6 | 49.18 | 62.05 | 67.70 | 69.02 | 70.75 | | | DS-TCN | 2021-01-19 |
Temporal Relational Modeling with Self-Supervision for Action Segmentation | ✓ Link | 59.1 | 46.6 | 61.9 | 68.7 | 68.9 | 68.3 | | | DTGRM | 2020-12-14 |
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation | ✓ Link | 56.2 | 45.9 | 58.6 | 64.1 | 65.6 | 67.6 | | | MS-TCN++ (I3D) | 2020-06-16 |
MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation | ✓ Link | 55.2 | 44.5 | 57.7 | 63.3 | 64.9 | 67.3 | | | MS-TCN++(I3D) (sh) | 2020-06-16 |
Improving Action Segmentation via Graph-Based Temporal Reasoning | | 51.6 | 43.3 | 54.0 | 57.5 | 58.7 | 65.0 | | | GTRM | 2020-06-01 |
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation | ✓ Link | 50.6 | 40.8 | 52.9 | 58.2 | 61.4 | 65.1 | | | MS-TCN (IDT) | 2019-03-05 |
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation | ✓ Link | 46.2 | 37.9 | 48.1 | 52.6 | 61.7 | 66.3 | | | MS-TCN (I3D) | 2019-03-05 |
Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities | | | 31.9 | | | | 47.4 | | | UDE | 2021-04-30 |
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks | ✓ Link | | | | | | 70.8 | | | RF++-SSTDA | 2022-06-14 |
Leveraging triplet loss for unsupervised action segmentation | ✓ Link | | | | | | 65.1 | 52.1 | | TSA (FINCH) | 2023-04-13 |
Leveraging triplet loss for unsupervised action segmentation | ✓ Link | | | | | | 63.7 | 53.3 | 58 | TSA (Kmeans) | 2023-04-13 |
Leveraging triplet loss for unsupervised action segmentation | ✓ Link | | | | | | 63.2 | 52.7 | 57.8 | TSA (Spectral) | 2023-04-13 |
Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation | ✓ Link | | | | | | 62.7 | 42.3 | | TW-FINCH (K=avg/activity) | 2021-03-20 |