OpenCodePapers

object-detection-on-coco

Object Detection
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodebox mAPAP50AP75APSAPMAPLHardware BurdenParams (M)Operations per network passGFLOPsModelNameReleaseDate
DETRs with Collaborative Hybrid Assignments Training✓ Link66.0304Co-DETR2022-11-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link65.52180InternImage-H (M3I Pre-training)2022-11-10
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information✓ Link65.4M3I Pre-training (InternImage-H)2022-11-17
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection✓ Link65.1MoCaE2023-09-26
A Strong and Reproducible Object Detector with Only Public Datasets✓ Link64.881.771.548.667.678689Focal-Stable-DINO (Focal-Huge, no TTA)2023-04-25
DETRs with Collaborative Hybrid Assignments Training✓ Link64.8218Co-DETR (Swin-L)2022-11-22
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale✓ Link64.781.971.748.567.777.9EVA2022-11-14
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining64.581.871.148.467.277.1Group DETR v22022-11-07
Focal Modulation Networks✓ Link64.4FocalNet-H (DINO)2022-03-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link64.3602InternImage-XL2022-11-10
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation✓ Link64.2FD-SwinV2-G2022-05-27
DETR Does Not Need Multi-Scale or Locality Design✓ Link63.982.170.748.266.876.7228Plain-DETR (Swin-L)2023-01-01
Reversible Column Networks✓ Link63.8RevCol-H(DINO)2022-12-22
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks✓ Link63.7BEiT-32022-08-22
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection✓ Link63.580.869.147.266.977.0214Relation-DETR (Focal-L)2024-07-16
NMS Strikes Back✓ Link63.580.470.246.166.976.9DETA (Swin-L)2022-12-12
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection✓ Link63.3DINO (Swin-L,multi-scale, TTA)2022-03-07
Swin Transformer V2: Scaling Up Capacity and Resolution✓ Link63.13000SwinV2-G (HTC++)2021-11-18
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection✓ Link63.0Grounding DINO2023-03-09
Florence: A New Foundation Model for Computer Vision✓ Link62.4Florence-CoSwin-H2021-11-22
GLIPv2: Unifying Localization and Vision-Language Understanding✓ Link62.4GLIPv2 (CoSwin-H, multi-scale)2022-06-12
General Object Foundation Model for Images and Videos at Scale✓ Link62.3GLEE-Pro2023-12-14
Grounded Language-Image Pre-training✓ Link61.579.567.745.364.975.0GLIP (Swin-L, multi-scale)2021-12-07
End-to-End Semi-Supervised Object Detection with Soft Teacher✓ Link61.3Soft Teacher + Swin-L (HTC++, multi-scale)2021-06-16
Vision Transformer Adapter for Dense Predictions✓ Link60.9ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)2022-05-17
Dynamic Head: Unifying Object Detection Heads with Attentions✓ Link60.678.566.664.074.2DyHead (Swin-L, multi scale, self-training)2021-06-15
General Object Foundation Model for Images and Videos at Scale✓ Link60.6GLEE-Plus2023-12-14
Vision Transformer Adapter for Dense Predictions✓ Link60.4ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)2022-05-17
GRiT: A Generative Region-to-text Transformer for Object Understanding✓ Link60.4GRiT (ViT-H, single-scale testing)2022-12-01
CBNet: A Composite Backbone Network Architecture for Object Detection✓ Link60.1CBNetV2 (Dual-Swin-L HTC, multi-scale)2021-07-01
Parameter-Inverted Image Pyramid Networks✓ Link60.079.065.4PIIP-H6B (DINO)2024-06-06
CBNet: A Composite Backbone Network Architecture for Object Detection✓ Link59.4CBNetV2 (Dual-Swin-L HTC, single-scale)2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers✓ Link58.9Focal-L (DyHead, multi-scale)2021-07-01
Dynamic Head: Unifying Object Detection Heads with Attentions✓ Link58.777.164.562.072.8DyHead (Swin-L, multi scale)2021-06-15
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link58.7Swin-L (HTC++, multi scale)2021-03-25
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link57.7Swin-L (HTC++, single scale)2021-03-25
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation✓ Link57.3Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)2020-12-13
CenterNet++ for Object Detection✓ Link57.173.762.438.759.271.3PyCenterNet (Swin-L, multi-scale)2022-04-18
Exploring Target Representations for Masked Autoencoders✓ Link56.8dBOT ViT-L (CLIP)2022-09-08
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors✓ Link56.6YOLOv7-D6 (44 fps)2022-07-06
SOLQ: Segmenting Objects by Learning Queries✓ Link56.574.660.537.66070.6SOLQ (Swin-L, single scale)2021-06-04
Probabilistic two-stage detection✓ Link56.474.061.638.759.768.6CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)2021-03-12
ISTR: End-to-End Instance Segmentation with Transformers✓ Link56.427.848.759.9ISTR (ResNet50-FPN-3x, single-scale)2021-05-03
Instances as Queries✓ Link56.175.961.937.458.970.317GQueryInst (single-scale)2021-05-05
Exploring Target Representations for Masked Autoencoders✓ Link56.1dBOT ViT-L2022-09-08
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors✓ Link56YOLOv7-E6 (56 fps)2022-07-06
Scaled-YOLOv4: Scaling Cross Stage Partial Network✓ Link55.873.261.2YOLOv4-P7 with TTA2020-11-16
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution✓ Link55.774.261.137.758.468.1DetectoRS (ResNeXt-101-64x4d, multi-scale)2020-06-03
You Only Learn One Representation: Unified Network for Multiple Tasks✓ Link55.473.360.6YOLOR-D6 (1280, single-scale, 30 fps)2021-05-10
Scaled-YOLOv4: Scaling Cross Stage Partial Network✓ Link54.972.660.2YOLOv4-P6 with TTA2020-11-16
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors✓ Link54.9YOLOv7-W6 (84 fps)2022-07-06
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation✓ Link54.8Cascade Eff-B7 NAS-FPN (1280)2020-12-13
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution✓ Link54.773.560.137.457.366.4DetectoRS (ResNeXt-101-32x4d, multi-scale)2020-06-03
General Object Foundation Model for Images and Videos at Scale✓ Link54.7GLEE-Lite2023-12-14
Scaled-YOLOv4: Scaling Cross Stage Partial Network✓ Link54.372.359.536.658.265.5YOLOv4-P6 CSP-P6 (single-scale, 32 fps)2020-11-16
Rethinking Pre-training and Self-training✓ Link54.3SpineNet-190 (1280, with Self-training on OpenImages, single-scale)2020-06-11
USB: Universal-Scale Object Detection Benchmark✓ Link54.171.659.935.857.267.4UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)2021-03-25
Dynamic Head: Unifying Object Detection Heads with Attentions✓ Link5472.159.3DyHead (ResNeXt-64x4d-101-DCN, multi scale)2021-06-15
Exploring Target Representations for Masked Autoencoders✓ Link53.6dBOT ViT-B (CLIP)2022-09-08
Probabilistic Anchor Assignment with IoU Prediction for Object Detection✓ Link53.571.659.136.056.366.9PAA (ResNext-152-32x8d + DCN, multi-scale)2020-07-16
Location-Sensitive Visual Recognition with Cross-IOU Loss✓ Link53.571.159.235.256.465.8LSNet (Res2Net-101+ DCN, multi-scale)2021-04-11
Exploring Target Representations for Masked Autoencoders✓ Link53.5dBOT ViT-B2022-09-08
ResNeSt: Split-Attention Networks✓ Link53.372.058.035.156.266.8ResNeSt-200 (multi-scale)2020-04-19
CBNet: A Novel Composite Backbone Network Architecture for Object Detection✓ Link53.371.958.5 35.5 55.8 66.7Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)2019-09-09
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution✓ Link53.371.658.533.956.566.9DetectoRS (ResNeXt-101-32x4d, single-scale)2020-06-03
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link53.370.959.235.756.165.6GFLV2 (Res2Net-101, DCN, multiscale)2020-11-25
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors✓ Link53.1YOLOv7-X (114 fps)2022-07-06
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder✓ Link52.7RelationNet++ (ResNeXt-64x4d-101-DCN)2020-10-29
EfficientDet: Scalable and Efficient Object Detection✓ Link52.671.656.9EfficientDet-D7 (1536)2019-11-20
Scaled-YOLOv4: Scaling Cross Stage Partial Network✓ Link52.570.358YOLOv4-P5 with TTA2020-11-16
Deformable DETR: Deformable Transformers for End-to-End Object Detection✓ Link52.371.958.134.454.465.617G17.3GDeformable DETR (ResNeXt-101+DCN)2020-10-08
Global Context Networks✓ Link52.370.956.9GCNet (ResNeXt-101 + DCN + cascade + GC r4)2020-12-24
PP-YOLOE: An evolved version of YOLO✓ Link52.269.956.533.356.366.4PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale )2022-03-30
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link52.171.856.535.45563.6RetinaNet (SpineNet-190, 1280x1280)2019-12-10
RepPoints V2: Verification Meets Regression for Object Detection✓ Link52.170.157.534.554.663.6RepPoints v2 (ResNeXt-101, DCN, multi-scale)2020-07-16
Attention-guided Context Feature Pyramid Network for Object Detection✓ Link51.970.45734.254.864.7AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)2020-05-23
OTA: Optimal Transport Assignment for Object Detection✓ Link51.568.657.134.153.764.1OTA (ResNeXt-101+DCN, multiscale)2021-03-26
YOLOX: Exceeding YOLO Series in 2021✓ Link51.5YOLOX-x(Modified CSP v5, 640x640, single-scale)2021-07-18
PP-YOLOE: An evolved version of YOLO✓ Link51.468.955.631.455.366.1PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale )2022-03-30
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors✓ Link51.4YOLOv7 (161 fps)2022-07-06
USB: Universal-Scale Object Detection Benchmark✓ Link51.370.055.831.755.364.9UniverseNet-20.08d (Res2Net-101, DCN, single-scale)2021-03-25
Revisiting the Sibling Head in Object Detector✓ Link51.271.956.033.854.864.2TSD(SENet154-DCN,multi-scale)2020-03-17
YOLOX: Exceeding YOLO Series in 2021✓ Link51.269.655.731.256.166.199.1YOLOX-X (Modified CSP v5)2021-07-18
iBOT: Image BERT Pre-Training with Online Tokenizer✓ Link51.2iBOT (ViT-B/16)2021-11-15
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link50.770.454.933.653.962.1RetinaNet (SpineNet-143, 1280x1280)2019-12-10
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection✓ Link50.768.956.333.252.962.4ATSS (ResNetXt-64x4d-101+DCN,multi-scale)2019-12-05
Learning Data Augmentation Strategies for Object Detection✓ Link50.734.255.564.5NAS-FPN (AmoebaNet-D, learned aug)2019-06-26
Boosting R-CNN: Reweighting R-CNN Samples by RPN's Error for Underwater Object Detection✓ Link50.7Boosting R-CNN*2022-06-28
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link50.66955.331.354.363.5GFLV2 (Res2Net-101, DCN)2020-11-25
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection✓ Link50.270.353.932.053.163.0aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)2020-09-28
Scale-Equalizing Pyramid Convolution for Object Detection✓ Link50.169.854.331.353.363.7FreeAnchor + SEPC (DCN, ResNext-101-64x4d)2020-05-06
D2Det: Towards High Quality Object Detection and Instance Segmentation✓ Link50.169.454.932.752.762.1D2Det (ResNet-101-DCN, multi-scale test)2020-06-01
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training✓ Link50.168.355.632.853.061.2Dynamic R-CNN (ResNet-101-DCN, multi-scale)2020-04-13
Revisiting the Sibling Head in Object Detector✓ Link49.469.654.432.752.561.0TSD(ResNet-101-Deformable, Image Pyramid)2020-03-17
RepPoints V2: Verification Meets Regression for Object Detection✓ Link49.468.953.430.352.162.3RepPoints v2 (ResNeXt-101, DCN)2020-07-16
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN✓ Link49.4A2MIM (ViT-B)2022-05-27
iBOT: Image BERT Pre-Training with Online Tokenizer✓ Link49.4iBOT (ViT-S/16)2021-11-15
Corner Proposal Network for Anchor-free, Two-stage Object Detection✓ Link49.267.353.731.051.962.4CPNDet (Hourglass-104, multi-scale)2020-07-27
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link4967.653.529.752.461.43GGFLV2 (ResNeXt-101, 32x4d, DCN)2020-11-25
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection✓ Link48.969.352.530.851.562.1aLRP Loss (ResNext-101-64x4d, DCN, single scale)2020-09-28
PP-YOLOE: An evolved version of YOLO✓ Link48.966.553.028.652.963.8PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale )2022-03-30
USB: Universal-Scale Object Detection Benchmark✓ Link48.867.553.030.152.361.1UniverseNet-20.08 (Res2Net-50, DCN, single-scale)2021-03-25
SOLQ: Segmenting Objects by Learning Queries✓ Link48.7SOLQ (ResNet101, single scale)2021-06-04
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link48.668.452.53252.362RetinaNet (SpineNet-96, 1024x1024)2019-12-10
Scale-Aware Trident Networks for Object Detection✓ Link48.469.753.531.851.360.3TridentNet (ResNet-101-Deformable, Image Pyramid)2019-01-07
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond✓ Link48.467.652.754.8GGCNet (ResNeXt-101 + DCN + cascade + GC r4)2019-04-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link48.366.552.828.851.960.73GGFLV2 (ResNet-101-DCN)2020-11-25
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields✓ Link48.23Swin-S (RPE w/ GAB)2023-05-08
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection✓ Link48.267.452.629.251.760.2GFL (X-101-32x4d-DCN, single-scale)2020-06-08
ISTR: End-to-End Instance Segmentation with Transformers✓ Link48.128.750.461.5ISTR (ResNet101-FPN-3x, single-scale)2021-05-03
YOLOX: Exceeding YOLO Series in 2021✓ Link48.0YOLOX-Darknet53(Darknet53, 640x640, single-scale)2021-07-18
Vision Transformer with Deformable Attention✓ Link47.969.651.232.351.863.4DAT-S (RetinaNet)2022-01-03
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection✓ Link47.868.451.130.250.859.1aLRP Loss (ResNext-101-64x4d, single scale)2020-09-28
Matrix Nets: A New Deep Architecture for Object Detection✓ Link47.866.252.329.750.460.7MatrixNet Corners (ResNet-152, multi-scale)2019-08-13
SOLQ: Segmenting Objects by Learning Queries✓ Link47.8SOLQ (ResNet50, single scale)2021-06-04
Dynamic Head: Unifying Object Detection Heads with Attentions✓ Link47.765.751.9DyHead (ResNeXt-64x4d-101)2021-06-15
Soft Anchor-Point Object Detection✓ Link47.467.451.128.150.361.5SAPD (ResNeXt-101, single-scale)2019-11-27
Path Aggregation Network for Instance Segmentation✓ Link47.467.251.8 30.151.760.0 PANet (ResNeXt-101, multi-scale)2018-03-05
Deep High-Resolution Representation Learning for Visual Recognition✓ Link47.365.951.228.049.759.815G71.7GHTC (HRNetV2p-W48)2019-08-20
Hybrid Task Cascade for Instance Segmentation✓ Link47.163.944.722.843.954.6HTC (ResNeXt-101-FPN)2019-01-22
CenterNet: Keypoint Triplets for Object Detection✓ Link47.064.550.728.949.958.9CenterNet511 (Hourglass-104, multi-scale)2019-04-17
Multiple Anchor Learning for Visual Object Detection✓ Link47.0MAL (ResNeXt101, multi-scale)2019-12-04
ISTR: End-to-End Instance Segmentation with Transformers✓ Link46.8ISTR (ResNet50-FPN-3x)2021-05-03
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link46.766.350.629.150.161.7RetinaNet (SpineNet-49, 896x896)2019-12-10
RepPoints: Point Set Representation for Object Detection✓ Link46.567.450.930.349.757.1RPDet (ResNet-101-DCN, multi-scale)2019-04-25
HoughNet: Integrating near and long-range evidence for bottom-up object detection✓ Link46.465.150.729.148.558.1HoughNet (MS)2020-07-05
Reducing Label Noise in Anchor-Free Object Detection✓ Link46.364.851.631.449.956.4PPDet (ResNeXt-101-FPN, multiscale)2020-08-03
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link46.264.350.527.849.957GFLV2 (ResNet-101)2020-11-25
SNIPER: Efficient Multi-Scale Training✓ Link46.167.051.629.648.958.129GSNIPER (ResNet-101)2018-05-23
Deep High-Resolution Representation Learning for Visual Recognition✓ Link46.164.050.327.148.658.315G61.8GMask R-CNN (HRNetV2p-W48 + cascade)2019-08-20
NAS-FCOS: Fast Neural Architecture Search for Object Detection✓ Link46.1ResNeXt-64x4d-101 NAS-FCOS @128-256 w/improvements2019-06-11
Deformable ConvNets v2: More Deformable, Better Results✓ Link46.067.950.827.849.159.5DCNv2 (ResNet-101, multi-scale)2018-11-27
Localization Uncertainty Estimation for Anchor-Free Object Detection46Gaussian-FCOS2020-06-28
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting✓ Link45.964.25026.34958.6Cascade R-CNN-FPN (ResNet-101, map-guided)2019-08-21
Multiple Anchor Learning for Visual Object Detection✓ Link45.9MAL (ResNeXt101, single-scale)2019-12-04
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link45.864.527.848.357.6CenterMask+VoVNetV2-99 (single-scale)2019-11-15
An Analysis of Scale Invariance in Object Detection - SNIP45.767.351.129.348.857.1D-RFCN + SNIP (DPN-98 with flip, multi-scale)2017-11-22
Scaled-YOLOv4: Scaling Cross Stage Partial Network✓ Link45.564.149.5274956.7YOLOv4 (CD53)2020-11-16
Attention-guided Context Feature Pyramid Network for Object Detection✓ Link4564.44926.947.756.6AC-FPN Cascade R-CNN(ResNet-101, single scale)2020-05-23
FreeAnchor: Learning to Match Anchors for Visual Object Detection✓ Link44.864.348.42747.956FreeAnchor (ResNeXt-101)2019-09-05
FCOS: Fully Convolutional One-Stage Object Detection✓ Link44.764.148.427.647.555.6FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)2019-04-02
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link44.763.148.627.155.9CenterMask+VoVNet2-57 (single-scale)2019-11-15
Feature Selective Anchor-Free Module for Single-Shot Object Detection✓ Link44.665.248.629.747.154.6FSAF (ResNeXt-101, multi-scale)2019-03-02
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection✓ Link44.665.047.524.648.158.3aLRP Loss (ResNext-101, DCN, 500 scale)2020-09-28
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link44.663.448.447.2CenterMask + X-101-32x8d (single-scale)2019-11-15
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link44.363.847.625.947.761.1RetinaNet (SpineNet-49, 640x640)2019-12-10
You Only Look One-level Feature✓ Link44.362.947.524.048.560.4YOLOF-DC52021-03-17
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection✓ Link44.362.348.526.847.754.1GFLV2 (ResNet-50)2020-11-25
Feature Intertwiner for Object Detection✓ Link44.267.551.127.250.357.7InterNet (ResNet-101-FPN, multi-scale)2019-03-28
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network✓ Link44.264.649.329.247.955.134GM2Det (VGG-16, multi-scale)2018-11-12
LIP: Local Importance-based Pooling✓ Link43.965.748.125.446.756.3Faster R-CNN (LIP-ResNet-101-MD w FPN)2019-08-12
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network✓ Link43.964.44829.649.654.327GM2Det (ResNet-101, multi-scale)2018-11-12
Learning Spatial Fusion for Single-Shot Object Detection✓ Link43.9 64.1 49.227.0 46.653.4YOLOv3 @800 + ASFF* (Darknet-53)2019-11-21
FoveaBox: Beyond Anchor-based Object Detector✓ Link43.963.547.726.846.955.6FoveaBox (ResNeXt-101)2019-04-08
Bottom-up Object Detection by Grouping Extreme and Center Points✓ Link43.760.547.024.146.957.6180GExtremeNet (Hourglass-104, multi-scale)2019-01-23
YOLOv4: Optimal Speed and Accuracy of Object Detection✓ Link43.565.747.326.746.753.3YOLOv4-6082020-04-23
SNIPER: Efficient Multi-Scale Training✓ Link43.565.048.626.146.356.029GSNIPER (ResNet-50)2018-05-23
Deep High-Resolution Representation Learning for Visual Recognition✓ Link43.546.522.257.816G21.7GCenterNet (HRNetV2-W48)2019-08-20
An Analysis of Scale Invariance in Object Detection - SNIP43.465.548.427.246.554.9D-RFCN + SNIP (ResNet-101, multi-scale)2017-11-22
Grid R-CNN✓ Link43.263.046.625.146.555.2Grid R-CNN (ResNeXt-101-FPN)2018-11-29
FCOS: Fully Convolutional One-Stage Object Detection✓ Link43.262.846.6 26.546.253.3FCOS (ResNeXt-101-64x4d-FPN)2019-04-02
CornerNet-Lite: Efficient Keypoint Based Object Detection✓ Link43.224.444.657.3CornerNet-Saccade (Hourglass-104, multi-scale)2019-04-18
PP-YOLOE: An evolved version of YOLO✓ Link43.160.546.623.246.456.9PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale )2022-03-30
Libra R-CNN: Towards Balanced Learning for Object Detection✓ Link43.0644725.345.654.6Libra R-CNN (ResNeXt-101-FPN)2019-04-04
Dynamic Head: Unifying Object Detection Heads with Attentions✓ Link4360.746.8DyHead (ResNet-50)2021-06-15
RepPoints: Point Set Representation for Object Detection✓ Link42.865.046.324.946.254.7RPDet (ResNet-101-DCN)2019-04-25
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link42.862.346.123.745.257.3SpineNet-49 (640, RetinaNet, single-scale)2019-12-10
Cascade R-CNN: Delving into High Quality Object Detection✓ Link42.862.146.323.745.555.2Cascade R-CNN (ResNet-101-FPN+, cascade)2017-12-03
Cascade R-CNN: High Quality Object Detection and Instance Segmentation✓ Link42.862.146.323.745.555.215GCascade R-CNN2019-06-24
Scale-Aware Trident Networks for Object Detection✓ Link42.763.646.523.946.656.6TridentNet (ResNet-101)2019-01-07
FCOS: Fully Convolutional One-Stage Object Detection✓ Link42.762.246.126.045.652.6FCOS (ResNeXt-32x8d-101-FPN)2019-04-02
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free✓ Link42.662.546.024.845.653.812GRetinaMask (ResNeXt-101-FPN-GN)2019-01-10
TOOD: Task-aligned One-stage Object Detection✓ Link42.560.346.4TAL + TAP2021-08-17
Deep High-Resolution Representation Learning for Visual Recognition✓ Link42.463.646.424.944.653.016G20.8GFaster R-CNN (HRNetV2p-W48)2019-08-20
Hierarchical Shot Detector✓ Link42.361.246.922.847.355.9HSD (Rest101, 768x768, single-scale test)2019-10-01
CornerNet: Detecting Objects as Paired Keypoints✓ Link42.157.845.320.844.856.7CornerNet511 (Hourglass-104, multi-scale)2018-08-03
FoveaBox: Beyond Anchor-based Object Detector✓ Link42.1FoveaBox (ResNeXt-101)2019-04-08
FCOS: Fully Convolutional One-Stage Object Detection✓ Link42.060.445.325.445.051.0FCOS (HRNet-W32-5l)2019-04-02
FoveaBox: Beyond Anchor-based Object Detector✓ Link41.9FoveaBox (ResNeXt-101)2019-04-08
Single-Shot Refinement Neural Network for Object Detection✓ Link41.862.945.725.645.154.1RefineDet512+ (ResNet-101)2017-11-18
Gradient Harmonized Single-stage Detector✓ Link41.662.844.222.345.155.3GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)2018-11-13
Objects as Points✓ Link41.621.543.956.026GCenterNet-DLA (DLA-34, multi-scale)2019-04-16
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link41.560.544.623.34558RetinaNet (SpineNet-49S, 640x640)2019-12-10
RepPoints: Point Set Representation for Object Detection✓ Link4162.944.323.644.151.7RPDet (ResNet-101)2019-04-25
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network✓ Link41.059.74522.146.553.834GM2Det (VGG-16, single-scale)2018-11-12
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection✓ Link41.02.48.4LeYOLO (Large@768)2024-06-20
Feature Selective Anchor-Free Module for Single-Shot Object Detection✓ Link40.961.5442444.251.338GFSAF (ResNet-101, single-scale)2019-03-02
Focal Loss for Dense Object Detection✓ Link40.861.144.124.144.251.24GRetinaNet (ResNeXt-101-FPN)2017-08-07
Cascade R-CNN: Delving into High Quality Object Detection✓ Link40.659.94422.642.752.112GCascade R-CNN (ResNet-50-FPN+, cascade)2017-12-03
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution✓ Link40.658.944.522.042.852.65GFaster R-CNN (Cascade RPN)2019-09-15
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation✓ Link40.624.643.953.3ResNet-50-DW-DPN (Deformable Kernels)2019-10-07
Acquisition of Localization Confidence for Accurate Object Detection✓ Link40.6IoU-Net2018-07-30
Deep High-Resolution Representation Learning for Visual Recognition✓ Link40.559.323.442.651.016G27.3GFCOS (HRNetV2p-W48)2019-08-20
Bounding Box Regression with Uncertainty for Accurate Object Detection✓ Link40.4ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS2018-09-23
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation✓ Link40.360.14322.143.551.5RDSNet (ResNet-101, RetinaNet, mask, MBRM)2019-12-11
Bottom-up Object Detection by Grouping Extreme and Center Points✓ Link40.255.543.220.443.253.1180GExtremeNet (Hourglass-104, single-scale)2019-01-23
Cross-Iteration Batch Normalization✓ Link40.160.544.135.857.338.5Mask R-CNN (ResNet-101-FPN, CBN)2020-02-13
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution✓ Link40.159.443.822.142.451.65GFast R-CNN (Cascade RPN)2019-09-15
Mask R-CNN✓ Link39.862.343.422.143.251.29GMask R-CNN (ResNeXt-101-FPN)2017-03-20
Region Proposal by Guided Anchoring✓ Link39.859.243.521.842.650.7GA-Faster-RCNN2019-01-10
NAS-FCOS: Fast Neural Architecture Search for Object Detection✓ Link39.8ResNet-50 NAS-FCOS @2562019-06-11
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN✓ Link39.8A2MIM (ResNet-50 2x)2022-05-27
ChainerCV: a Library for Deep Learning in Computer Vision✓ Link39.5FPN (ResNet101 backbone)2017-08-28
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free✓ Link39.458.642.321.942.051.09GRetinaMask (ResNet-50-FPN)2019-01-10
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection✓ Link39.35.8LeYOLO (Medium@640)2024-06-20
Attention Augmented Convolutional Networks✓ Link39.224.5GAA-ResNet-10 + RetinaNet2019-04-22
Multiple Anchor Learning for Visual Object Detection✓ Link39.2MAL (ResNet50, single-scale)2019-12-04
Focal Loss for Dense Object Detection✓ Link39.159.142.321.842.750.24GRetinaNet (ResNet-101-FPN)2017-08-07
Cascade R-CNN: Delving into High Quality Object Detection✓ Link38.861.141.921.341.849.83GCascade R-CNN (ResNet-101-FPN+)2017-12-03
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network✓ Link38.859.441.720.543.953.427GM2Det (ResNet-101, single-scale)2018-11-12
SaccadeNet: A Fast and Accurate Object Detector✓ Link38.555.641.419.242.150.646GSaccadeNet (DLA-34-DCN)2020-03-26
Mask R-CNN✓ Link38.260.341.720.141.150.29GMask R-CNN (ResNet-101-FPN)2017-03-20
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection✓ Link38.21.94.51LeYOLO (Small@640)2024-06-20
Segmentation is All You Need38.1WSMA-Seg2019-04-30
Compact Global Descriptor for Neural Networks✓ Link37.9Faster R-CNN + FPN + CGD2019-07-23
CornerNet: Detecting Objects as Paired Keypoints✓ Link37.853.740.117.039.050.5CornerNet511 (Hourglass-52, single-scale)2018-08-03
Single-Shot Refinement Neural Network for Object Detection✓ Link37.658.740.822.740.348.3RefineDet512+ (VGG-16)2017-11-18
Deformable Convolutional Networks✓ Link37.558.019.440.152.5DeformConv-R-FCN (Aligned-Inception-ResNet)2017-03-17
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era✓ Link37.45840.117.541.151.2Faster R-CNN (ImageNet+300M)2017-07-10
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation✓ Link36.9Mask R-CNN (Bottleneck-injected ResNet-50, FPN)2020-11-25
Beyond Skip Connections: Top-Down Modulation for Object Detection✓ Link36.8Faster R-CNN + TDM2016-12-20
Cascade R-CNN: Delving into High Quality Object Detection✓ Link36.55939.220.338.846.43GCascade R-CNN (ResNet-50-FPN+)2017-12-03
Single-Shot Refinement Neural Network for Object Detection✓ Link36.457.539.516.639.951.4RefineDet512 (ResNet-101)2017-11-18
Feature Pyramid Networks for Object Detection✓ Link36.22GFaster R-CNN + FPN2016-12-09
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation✓ Link35.9Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)2020-11-25