OpenCodePapers

action-recognition-in-videos-on-activitynet

Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemAPModelNameReleaseDate
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition✓ Link96.9Text4Vis (w/ ViT-L)2022-07-04
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models✓ Link96.1BIKE2022-12-31
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link95.9InternVideo2-6B2024-03-22
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition94.3NSNet (w/ Swin-L)2022-07-21
Temporal Saliency Query Network for Efficient Video Recognition93.7TSQNet (w/ Swin-L)2022-07-21
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning✓ Link90.5DSANet (w/ 3D ResNet50)2021-05-25
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition90.05MARL (w/ SEResNeXt-152)2019-07-31
Listen to Look: Action Recognition by Previewing Audio✓ Link89.9ListenToLook2019-12-10
Dynamic Sampling Networks for Efficient Action Recognition in Videos87.9DSN2020-06-28
SMART Frame Selection for Action Recognition84.4SMART2020-12-19
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition84.0Ada3D2020-12-29
Fine-grained Video Categorization with Redundancy Reduction Attention83.4RRA2018-10-26
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks✓ Link78.9P3D2017-11-28
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web53.8VGG19 + 393K webcam images2015-12-22
Towards Universal Representation for Unseen Action Recognition53.8CD-UAR2018-03-22
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web52.3VGG192015-12-22