video-object-detection-on-imagenet-vid

Object DetectionVideo Object Detection

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	MAP	ModelName	ReleaseDate
Practical Video Object Detection via Feature Selection and Aggregation	✓ Link	93.2	YOLOV++	2024-07-29
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection	✓ Link	92.5	DiffusionVID (Swin-B)	2023-10-30
Objects do not disappear: Video object detection by single-frame object location anticipation	✓ Link	91.3	Ours (Def. DETR + SwinB)	2023-08-09
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection	✓ Link	91.1	VSTAM	2022-06-17
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection		90.3	TGBFormer (Swin B)	2025-03-18
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers	✓ Link	90.1	TransVOD (Swin Base)	2022-01-13
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection	✓ Link	88.1	PTSEFormer (ResNet-101)	2022-09-06
Objects do not disappear: Video object detection by single-frame object location anticipation	✓ Link	87.9	Ours (Def. DETR + R101)	2023-08-09
YOLOV: Making Still Image Object Detectors Great at Video Object Detection	✓ Link	87.5	YOLOV	2022-08-20
Objects do not disappear: Video object detection by single-frame object location anticipation	✓ Link	87.2	Ours (Faster RCNN + R101)	2023-08-09
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection	✓ Link	87.1	DiffusionVID (ResNet-101)	2023-10-30
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection		85.9	DAFA-F (ResNeXt-101)	2022-09-01
Identity-Consistent Aggregation for Video Object Detection	✓ Link	85.8	ClipVID	2023-08-15
Mining Inter-Video Proposal Relations for Video Object Detection	✓ Link	85.5	HVRNet (ResNeXt101-32x4d)
Memory Enhanced Global-Local Aggregation for Video Object Detection	✓ Link	85.4	MEGA (ResNeXt101)	2020-03-26
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection		84.8	BoxMask(ResNeXt101)	2022-10-12
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection		84.5	DAFA-F (ResNet-101)	2022-09-01
Sequence Level Semantics Aggregation for Video Object Detection	✓ Link	84.3	SELSA (ResNeXt-101)	2019-07-15
Temporal RoI Align for Video Object Recognition	✓ Link	84.3	Temporal ROI Align (ResNeXt101)	2021-09-08
Robust and Efficient Post-Processing for Video Object Detection (REPP)	✓ Link	84.2	REPP + SELSA (ResNet-101)	2020-10-01
Mining Inter-Video Proposal Relations for Video Object Detection	✓ Link	83.8	HVRNet (ResNest101)
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection		83.5	Tracklet-Conditioned Detection+DCNv2+FGFA	2018-11-27
Sequence Level Semantics Aggregation for Video Object Detection	✓ Link	82.69	SELSA (ResNet-101)	2019-07-15
Short-term anchor linking and long-term self-guided attention for video object detection	✓ Link	82.4	SLTnet FPN-X101	2021-04-18
Learning Where to Focus for Efficient Video Object Detection	✓ Link	81.7	LSTS (ResNet-101)	2019-11-13
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection		80.7	BoxMask (ResNet-50)	2022-10-12
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection		80.3	SparseVOD (ResNet-50)	2022-10-05
Flow-Guided Feature Aggregation for Video Object Detection	✓ Link	80.1	FGFA + Seq-NMS	2017-03-29
Robust and Efficient Post-Processing for Video Object Detection (REPP)	✓ Link	80.1	REPP + FGFA	2020-10-01
TSM: Temporal Shift Module for Efficient Video Understanding	✓ Link	76.3	Online TSM	2018-11-20
Robust and Efficient Post-Processing for Video Object Detection (REPP)	✓ Link	75.1	REPP + YOLOv3	2020-10-01
Robust and Efficient Post-Processing for Video Object Detection (REPP)	✓ Link	68.6	YOLOv3	2020-10-01
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection	✓ Link	63.9	Looking Fast and Slow	2019-03-25

OpenCodePapers

video-object-detection-on-imagenet-vid