Practical Video Object Detection via Feature Selection and Aggregation | ✓ Link | 93.2 | YOLOV++ | 2024-07-29 |
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection | ✓ Link | 92.5 | DiffusionVID (Swin-B) | 2023-10-30 |
Objects do not disappear: Video object detection by single-frame object location anticipation | ✓ Link | 91.3 | Ours (Def. DETR + SwinB) | 2023-08-09 |
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection | ✓ Link | 91.1 | VSTAM | 2022-06-17 |
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection | | 90.3 | TGBFormer (Swin B) | 2025-03-18 |
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers | ✓ Link | 90.1 | TransVOD (Swin Base) | 2022-01-13 |
PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection | ✓ Link | 88.1 | PTSEFormer (ResNet-101) | 2022-09-06 |
Objects do not disappear: Video object detection by single-frame object location anticipation | ✓ Link | 87.9 | Ours (Def. DETR + R101) | 2023-08-09 |
YOLOV: Making Still Image Object Detectors Great at Video Object Detection | ✓ Link | 87.5 | YOLOV | 2022-08-20 |
Objects do not disappear: Video object detection by single-frame object location anticipation | ✓ Link | 87.2 | Ours (Faster RCNN + R101) | 2023-08-09 |
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection | ✓ Link | 87.1 | DiffusionVID (ResNet-101) | 2023-10-30 |
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection | | 85.9 | DAFA-F (ResNeXt-101) | 2022-09-01 |
Identity-Consistent Aggregation for Video Object Detection | ✓ Link | 85.8 | ClipVID | 2023-08-15 |
Mining Inter-Video Proposal Relations for Video Object Detection | ✓ Link | 85.5 | HVRNet (ResNeXt101-32x4d) | |
Memory Enhanced Global-Local Aggregation for Video Object Detection | ✓ Link | 85.4 | MEGA (ResNeXt101) | 2020-03-26 |
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection | | 84.8 | BoxMask(ResNeXt101) | 2022-10-12 |
DAFA: Diversity-Aware Feature Aggregation for Attention-Based Video Object Detection | | 84.5 | DAFA-F (ResNet-101) | 2022-09-01 |
Temporal RoI Align for Video Object Recognition | ✓ Link | 84.3 | Temporal ROI Align (ResNeXt101) | 2021-09-08 |
Sequence Level Semantics Aggregation for Video Object Detection | ✓ Link | 84.3 | SELSA (ResNeXt-101) | 2019-07-15 |
Robust and Efficient Post-Processing for Video Object Detection (REPP) | ✓ Link | 84.2 | REPP + SELSA (ResNet-101) | 2020-10-01 |
Mining Inter-Video Proposal Relations for Video Object Detection | ✓ Link | 83.8 | HVRNet (ResNest101) | |
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection | | 83.5 | Tracklet-Conditioned Detection+DCNv2+FGFA | 2018-11-27 |
Sequence Level Semantics Aggregation for Video Object Detection | ✓ Link | 82.69 | SELSA (ResNet-101) | 2019-07-15 |
Short-term anchor linking and long-term self-guided attention for video object detection | ✓ Link | 82.4 | SLTnet FPN-X101 | 2021-04-18 |
Learning Where to Focus for Efficient Video Object Detection | ✓ Link | 81.7 | LSTS (ResNet-101) | 2019-11-13 |
BoxMask: Revisiting Bounding Box Supervision for Video Object Detection | | 80.7 | BoxMask (ResNet-50) | 2022-10-12 |
Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection | | 80.3 | SparseVOD (ResNet-50) | 2022-10-05 |
Robust and Efficient Post-Processing for Video Object Detection (REPP) | ✓ Link | 80.1 | REPP + FGFA | 2020-10-01 |
Flow-Guided Feature Aggregation for Video Object Detection | ✓ Link | 80.1 | FGFA + Seq-NMS | 2017-03-29 |
TSM: Temporal Shift Module for Efficient Video Understanding | ✓ Link | 76.3 | Online TSM | 2018-11-20 |
Robust and Efficient Post-Processing for Video Object Detection (REPP) | ✓ Link | 75.1 | REPP + YOLOv3 | 2020-10-01 |
Robust and Efficient Post-Processing for Video Object Detection (REPP) | ✓ Link | 68.6 | YOLOv3 | 2020-10-01 |
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection | ✓ Link | 63.9 | Looking Fast and Slow | 2019-03-25 |