OpenCodePapers

video-object-detection-on-imagenet-vid

Object DetectionVideo Object Detection
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMAPModelNameReleaseDate
Practical Video Object Detection via Feature Selection and Aggregation✓ Link93.2YOLOV++2024-07-29
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection✓ Link92.5DiffusionVID (Swin-B)2023-10-30
Objects do not disappear: Video object detection by single-frame object location anticipation✓ Link91.3Ours (Def. DETR + SwinB)2023-08-09
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection✓ Link91.1VSTAM2022-06-17
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection90.3TGBFormer (Swin B)2025-03-18
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers✓ Link90.1TransVOD (Swin Base)2022-01-13
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection✓ Link88.1PTSEFormer (ResNet-101)2022-09-06
Objects do not disappear: Video object detection by single-frame object location anticipation✓ Link87.9Ours (Def. DETR + R101)2023-08-09
YOLOV: Making Still Image Object Detectors Great at Video Object Detection✓ Link87.5YOLOV2022-08-20
Objects do not disappear: Video object detection by single-frame object location anticipation✓ Link87.2Ours (Faster RCNN + R101)2023-08-09
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection✓ Link87.1DiffusionVID (ResNet-101)2023-10-30
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection85.9DAFA-F (ResNeXt-101)2022-09-01
Identity-Consistent Aggregation for Video Object Detection✓ Link85.8ClipVID2023-08-15
Mining Inter-Video Proposal Relations for Video Object Detection✓ Link85.5HVRNet (ResNeXt101-32x4d)
Memory Enhanced Global-Local Aggregation for Video Object Detection✓ Link85.4MEGA (ResNeXt101)2020-03-26
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection84.8BoxMask(ResNeXt101)2022-10-12
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection84.5DAFA-F (ResNet-101)2022-09-01
Temporal RoI Align for Video Object Recognition✓ Link84.3Temporal ROI Align (ResNeXt101)2021-09-08
Sequence Level Semantics Aggregation for Video Object Detection✓ Link84.3SELSA (ResNeXt-101)2019-07-15
Robust and Efficient Post-Processing for Video Object Detection (REPP)✓ Link84.2REPP + SELSA (ResNet-101)2020-10-01
Mining Inter-Video Proposal Relations for Video Object Detection✓ Link83.8HVRNet (ResNest101)
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection83.5Tracklet-Conditioned Detection+DCNv2+FGFA2018-11-27
Sequence Level Semantics Aggregation for Video Object Detection✓ Link82.69SELSA (ResNet-101)2019-07-15
Short-term anchor linking and long-term self-guided attention for video object detection✓ Link82.4SLTnet FPN-X1012021-04-18
Learning Where to Focus for Efficient Video Object Detection✓ Link81.7LSTS (ResNet-101)2019-11-13
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection80.7BoxMask (ResNet-50)2022-10-12
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection80.3SparseVOD (ResNet-50)2022-10-05
Robust and Efficient Post-Processing for Video Object Detection (REPP)✓ Link80.1REPP + FGFA2020-10-01
Flow-Guided Feature Aggregation for Video Object Detection✓ Link80.1FGFA + Seq-NMS2017-03-29
TSM: Temporal Shift Module for Efficient Video Understanding✓ Link76.3Online TSM2018-11-20
Robust and Efficient Post-Processing for Video Object Detection (REPP)✓ Link75.1REPP + YOLOv32020-10-01
Robust and Efficient Post-Processing for Video Object Detection (REPP)✓ Link68.6YOLOv32020-10-01
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection✓ Link63.9Looking Fast and Slow2019-03-25