Paper | Code | Recall@5 | Top-5 Verb | Top-5 Noun | ModelName | ReleaseDate |
---|---|---|---|---|---|---|
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | 27.60 | 55.62 | 54.23 | PlausiVL | 2024-05-30 | |
Interaction Region Visual Transformer for Egocentric Action Anticipation | ✓ Link | 25.89 | InAViT | 2022-11-25 | ||
Uncertainty-aware Action Decoupling Transformer for Action Anticipation | 23.0 | 43.5 | 46.6 | UADT | 2024-01-01 | |
Semantically Guided Representation Learning For Action Anticipation | ✓ Link | 19.9 | S-GEAR | 2024-07-02 | ||
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation | ✓ Link | 18.5 | AFFT | 2022-10-23 | ||
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition | ✓ Link | 17.7 | MeMViT-24 | 2022-01-20 | ||
Anticipative Video Transformer | ✓ Link | 15.9 | AVT+ | 2021-06-03 | ||
Technical Report: Temporal Aggregate Representations | ✓ Link | 14.73 | TempAgg | 2021-06-06 | ||
Rescaling Egocentric Vision | ✓ Link | 13.94 | RU-LSTM | 2020-06-23 |