OpenCodePapers

action-recognition-in-videos-on-activitynet

Action RecognitionAction Recognition In Videos

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mAP	ModelName	ReleaseDate
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition	✓ Link	96.9	Text4Vis (w/ ViT-L)	2022-07-04
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models	✓ Link	96.1	BIKE	2022-12-31
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	95.9	InternVideo2-6B	2024-03-22
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition		94.3	NSNet (w/ Swin-L)	2022-07-21
Temporal Saliency Query Network for Efficient Video Recognition		93.7	TSQNet (w/ Swin-L)	2022-07-21
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning	✓ Link	90.5	DSANet (w/ 3D ResNet50)	2021-05-25
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition		90.05	MARL (w/ SEResNeXt-152)	2019-07-31
Listen to Look: Action Recognition by Previewing Audio	✓ Link	89.9	ListenToLook	2019-12-10
Dynamic Sampling Networks for Efficient Action Recognition in Videos		87.9	DSN	2020-06-28
SMART Frame Selection for Action Recognition		84.4	SMART	2020-12-19
2D or not 2D? Adaptive 3D Convolution Selection for Efficient Video Recognition		84.0	Ada3D	2020-12-29
Fine-grained Video Categorization with Redundancy Reduction Attention		83.4	RRA	2018-10-26
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks	✓ Link	78.9	P3D	2017-11-28
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web		53.8	VGG19 + 393K webcam images	2015-12-22
Towards Universal Representation for Unseen Action Recognition		53.8	CD-UAR	2018-03-22
Do Less and Achieve More: Training CNNs for Action Recognition Utilizing Action Images from the Web		52.3	VGG19	2015-12-22