object-detection-on-coco

Object Detection

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	box mAP	AP50	AP75	APS	APM	APL	Hardware Burden	Params (M)	Operations per network pass	GFLOPs	ModelName	ReleaseDate
DETRs with Collaborative Hybrid Assignments Training	✓ Link	66.0							304			Co-DETR	2022-11-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	65.5							2180			InternImage-H (M3I Pre-training)	2022-11-10
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information	✓ Link	65.4										M3I Pre-training (InternImage-H)	2022-11-17
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection	✓ Link	65.1										MoCaE	2023-09-26
DETRs with Collaborative Hybrid Assignments Training	✓ Link	64.8							218			Co-DETR (Swin-L)	2022-11-22
A Strong and Reproducible Object Detector with Only Public Datasets	✓ Link	64.8	81.7	71.5	48.6	67.6	78		689			Focal-Stable-DINO (Focal-Huge, no TTA)	2023-04-25
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	✓ Link	64.7	81.9	71.7	48.5	67.7	77.9					EVA	2022-11-14
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining		64.5	81.8	71.1	48.4	67.2	77.1					Group DETR v2	2022-11-07
Focal Modulation Networks	✓ Link	64.4										FocalNet-H (DINO)	2022-03-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	64.3							602			InternImage-XL	2022-11-10
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation	✓ Link	64.2										FD-SwinV2-G	2022-05-27
DETR Does Not Need Multi-Scale or Locality Design	✓ Link	63.9	82.1	70.7	48.2	66.8	76.7		228			Plain-DETR (Swin-L)	2023-01-01
Reversible Column Networks	✓ Link	63.8										RevCol-H(DINO)	2022-12-22
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks	✓ Link	63.7										BEiT-3	2022-08-22
NMS Strikes Back	✓ Link	63.5	80.4	70.2	46.1	66.9	76.9					DETA (Swin-L)	2022-12-12
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection	✓ Link	63.5	80.8	69.1	47.2	66.9	77.0		214			Relation-DETR (Focal-L)	2024-07-16
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection	✓ Link	63.3										DINO (Swin-L,multi-scale, TTA)	2022-03-07
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	63.1							3000			SwinV2-G (HTC++)	2021-11-18
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection	✓ Link	63.0										Grounding DINO	2023-03-09
Florence: A New Foundation Model for Computer Vision	✓ Link	62.4										Florence-CoSwin-H	2021-11-22
GLIPv2: Unifying Localization and Vision-Language Understanding	✓ Link	62.4										GLIPv2 (CoSwin-H, multi-scale)	2022-06-12
General Object Foundation Model for Images and Videos at Scale	✓ Link	62.3										GLEE-Pro	2023-12-14
Grounded Language-Image Pre-training	✓ Link	61.5	79.5	67.7	45.3	64.9	75.0					GLIP (Swin-L, multi-scale)	2021-12-07
End-to-End Semi-Supervised Object Detection with Soft Teacher	✓ Link	61.3										Soft Teacher + Swin-L (HTC++, multi-scale)	2021-06-16
Vision Transformer Adapter for Dense Predictions	✓ Link	60.9										ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	2022-05-17
Dynamic Head: Unifying Object Detection Heads with Attentions	✓ Link	60.6	78.5	66.6		64.0	74.2					DyHead (Swin-L, multi scale, self-training)	2021-06-15
General Object Foundation Model for Images and Videos at Scale	✓ Link	60.6										GLEE-Plus	2023-12-14
Vision Transformer Adapter for Dense Predictions	✓ Link	60.4										ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)	2022-05-17
GRiT: A Generative Region-to-text Transformer for Object Understanding	✓ Link	60.4										GRiT (ViT-H, single-scale testing)	2022-12-01
CBNet: A Composite Backbone Network Architecture for Object Detection	✓ Link	60.1										CBNetV2 (Dual-Swin-L HTC, multi-scale)	2021-07-01
Parameter-Inverted Image Pyramid Networks	✓ Link	60.0	79.0	65.4								PIIP-H6B (DINO)	2024-06-06
CBNet: A Composite Backbone Network Architecture for Object Detection	✓ Link	59.4										CBNetV2 (Dual-Swin-L HTC, single-scale)	2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers	✓ Link	58.9										Focal-L (DyHead, multi-scale)	2021-07-01
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	58.7										Swin-L (HTC++, multi scale)	2021-03-25
Dynamic Head: Unifying Object Detection Heads with Attentions	✓ Link	58.7	77.1	64.5		62.0	72.8					DyHead (Swin-L, multi scale)	2021-06-15
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	57.7										Swin-L (HTC++, single scale)	2021-03-25
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation	✓ Link	57.3										Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)	2020-12-13
CenterNet++ for Object Detection	✓ Link	57.1	73.7	62.4	38.7	59.2	71.3					PyCenterNet (Swin-L, multi-scale)	2022-04-18
Exploring Target Representations for Masked Autoencoders	✓ Link	56.8										dBOT ViT-L (CLIP)	2022-09-08
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	✓ Link	56.6										YOLOv7-D6 (44 fps)	2022-07-06
SOLQ: Segmenting Objects by Learning Queries	✓ Link	56.5	74.6	60.5	37.6	60	70.6					SOLQ (Swin-L, single scale)	2021-06-04
Probabilistic two-stage detection	✓ Link	56.4	74.0	61.6	38.7	59.7	68.6					CenterNet2 (Res2Net-101-DCN-BiFPN, self-training, 1560 single-scale)	2021-03-12
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	56.4			27.8	48.7	59.9					ISTR (ResNet50-FPN-3x, single-scale)	2021-05-03
Instances as Queries	✓ Link	56.1	75.9	61.9	37.4	58.9	70.3	17G				QueryInst (single-scale)	2021-05-05
Exploring Target Representations for Masked Autoencoders	✓ Link	56.1										dBOT ViT-L	2022-09-08
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	✓ Link	56										YOLOv7-E6 (56 fps)	2022-07-06
Scaled-YOLOv4: Scaling Cross Stage Partial Network	✓ Link	55.8	73.2	61.2								YOLOv4-P7 with TTA	2020-11-16
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution	✓ Link	55.7	74.2	61.1	37.7	58.4	68.1					DetectoRS (ResNeXt-101-64x4d, multi-scale)	2020-06-03
You Only Learn One Representation: Unified Network for Multiple Tasks	✓ Link	55.4	73.3	60.6								YOLOR-D6 (1280, single-scale, 30 fps)	2021-05-10
Scaled-YOLOv4: Scaling Cross Stage Partial Network	✓ Link	54.9	72.6	60.2								YOLOv4-P6 with TTA	2020-11-16
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	✓ Link	54.9										YOLOv7-W6 (84 fps)	2022-07-06
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation	✓ Link	54.8										Cascade Eff-B7 NAS-FPN (1280)	2020-12-13
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution	✓ Link	54.7	73.5	60.1	37.4	57.3	66.4					DetectoRS (ResNeXt-101-32x4d, multi-scale)	2020-06-03
General Object Foundation Model for Images and Videos at Scale	✓ Link	54.7										GLEE-Lite	2023-12-14
Rethinking Pre-training and Self-training	✓ Link	54.3										SpineNet-190 (1280, with Self-training on OpenImages, single-scale)	2020-06-11
Scaled-YOLOv4: Scaling Cross Stage Partial Network	✓ Link	54.3	72.3	59.5	36.6	58.2	65.5					YOLOv4-P6 CSP-P6 (single-scale, 32 fps)	2020-11-16
USB: Universal-Scale Object Detection Benchmark	✓ Link	54.1	71.6	59.9	35.8	57.2	67.4					UniverseNet-20.08d (Res2Net-101, DCN, multi-scale)	2021-03-25
Dynamic Head: Unifying Object Detection Heads with Attentions	✓ Link	54	72.1	59.3								DyHead (ResNeXt-64x4d-101-DCN, multi scale)	2021-06-15
Exploring Target Representations for Masked Autoencoders	✓ Link	53.6										dBOT ViT-B (CLIP)	2022-09-08
Probabilistic Anchor Assignment with IoU Prediction for Object Detection	✓ Link	53.5	71.6	59.1	36.0	56.3	66.9					PAA (ResNext-152-32x8d + DCN, multi-scale)	2020-07-16
Location-Sensitive Visual Recognition with Cross-IOU Loss	✓ Link	53.5	71.1	59.2	35.2	56.4	65.8					LSNet (Res2Net-101+ DCN, multi-scale)	2021-04-11
Exploring Target Representations for Masked Autoencoders	✓ Link	53.5										dBOT ViT-B	2022-09-08
CBNet: A Novel Composite Backbone Network Architecture for Object Detection	✓ Link	53.3	71.9	58.5	35.5	55.8	66.7					Cascade Mask R-CNN (Triple-ResNeXt152, multi-scale)	2019-09-09
ResNeSt: Split-Attention Networks	✓ Link	53.3	72.0	58.0	35.1	56.2	66.8					ResNeSt-200 (multi-scale)	2020-04-19
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution	✓ Link	53.3	71.6	58.5	33.9	56.5	66.9					DetectoRS (ResNeXt-101-32x4d, single-scale)	2020-06-03
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	53.3	70.9	59.2	35.7	56.1	65.6					GFLV2 (Res2Net-101, DCN, multiscale)	2020-11-25
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	✓ Link	53.1										YOLOv7-X (114 fps)	2022-07-06
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder	✓ Link	52.7										RelationNet++ (ResNeXt-64x4d-101-DCN)	2020-10-29
EfficientDet: Scalable and Efficient Object Detection	✓ Link	52.6	71.6	56.9								EfficientDet-D7 (1536)	2019-11-20
Scaled-YOLOv4: Scaling Cross Stage Partial Network	✓ Link	52.5	70.3	58								YOLOv4-P5 with TTA	2020-11-16
Deformable DETR: Deformable Transformers for End-to-End Object Detection	✓ Link	52.3	71.9	58.1	34.4	54.4	65.6	17G		17.3G		Deformable DETR (ResNeXt-101+DCN)	2020-10-08
Global Context Networks	✓ Link	52.3	70.9	56.9								GCNet (ResNeXt-101 + DCN + cascade + GC r4)	2020-12-24
PP-YOLOE: An evolved version of YOLO	✓ Link	52.2	69.9	56.5	33.3	56.3	66.4					PP-YOLOE-x(CSPRepResNet-x, 640x640, single-scale )	2022-03-30
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	52.1	71.8	56.5	35.4	55	63.6					RetinaNet (SpineNet-190, 1280x1280)	2019-12-10
RepPoints V2: Verification Meets Regression for Object Detection	✓ Link	52.1	70.1	57.5	34.5	54.6	63.6					RepPoints v2 (ResNeXt-101, DCN, multi-scale)	2020-07-16
Attention-guided Context Feature Pyramid Network for Object Detection	✓ Link	51.9	70.4	57	34.2	54.8	64.7					AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM)	2020-05-23
OTA: Optimal Transport Assignment for Object Detection	✓ Link	51.5	68.6	57.1	34.1	53.7	64.1					OTA (ResNeXt-101+DCN, multiscale)	2021-03-26
YOLOX: Exceeding YOLO Series in 2021	✓ Link	51.5										YOLOX-x(Modified CSP v5, 640x640, single-scale)	2021-07-18
PP-YOLOE: An evolved version of YOLO	✓ Link	51.4	68.9	55.6	31.4	55.3	66.1					PP-YOLOE-l(CSPRepResNet-l, 640x640, single-scale )	2022-03-30
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	✓ Link	51.4										YOLOv7 (161 fps)	2022-07-06
USB: Universal-Scale Object Detection Benchmark	✓ Link	51.3	70.0	55.8	31.7	55.3	64.9					UniverseNet-20.08d (Res2Net-101, DCN, single-scale)	2021-03-25
Revisiting the Sibling Head in Object Detector	✓ Link	51.2	71.9	56.0	33.8	54.8	64.2					TSD(SENet154-DCN,multi-scale)	2020-03-17
YOLOX: Exceeding YOLO Series in 2021	✓ Link	51.2	69.6	55.7	31.2	56.1	66.1		99.1			YOLOX-X (Modified CSP v5)	2021-07-18
iBOT: Image BERT Pre-Training with Online Tokenizer	✓ Link	51.2										iBOT (ViT-B/16)	2021-11-15
Learning Data Augmentation Strategies for Object Detection	✓ Link	50.7			34.2	55.5	64.5					NAS-FPN (AmoebaNet-D, learned aug)	2019-06-26
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection	✓ Link	50.7	68.9	56.3	33.2	52.9	62.4					ATSS (ResNetXt-64x4d-101+DCN,multi-scale)	2019-12-05
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	50.7	70.4	54.9	33.6	53.9	62.1					RetinaNet (SpineNet-143, 1280x1280)	2019-12-10
Boosting R-CNN: Reweighting R-CNN Samples by RPN's Error for Underwater Object Detection	✓ Link	50.7										Boosting R-CNN*	2022-06-28
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	50.6	69	55.3	31.3	54.3	63.5					GFLV2 (Res2Net-101, DCN)	2020-11-25
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection	✓ Link	50.2	70.3	53.9	32.0	53.1	63.0					aLRP Loss (ResNext-101-64x4d, DCN, multiscale test)	2020-09-28
Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training	✓ Link	50.1	68.3	55.6	32.8	53.0	61.2					Dynamic R-CNN (ResNet-101-DCN, multi-scale)	2020-04-13
Scale-Equalizing Pyramid Convolution for Object Detection	✓ Link	50.1	69.8	54.3	31.3	53.3	63.7					FreeAnchor + SEPC (DCN, ResNext-101-64x4d)	2020-05-06
D2Det: Towards High Quality Object Detection and Instance Segmentation	✓ Link	50.1	69.4	54.9	32.7	52.7	62.1					D2Det (ResNet-101-DCN, multi-scale test)	2020-06-01
Revisiting the Sibling Head in Object Detector	✓ Link	49.4	69.6	54.4	32.7	52.5	61.0					TSD(ResNet-101-Deformable, Image Pyramid)	2020-03-17
RepPoints V2: Verification Meets Regression for Object Detection	✓ Link	49.4	68.9	53.4	30.3	52.1	62.3					RepPoints v2 (ResNeXt-101, DCN)	2020-07-16
iBOT: Image BERT Pre-Training with Online Tokenizer	✓ Link	49.4										iBOT (ViT-S/16)	2021-11-15
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	✓ Link	49.4										A2MIM (ViT-B)	2022-05-27
Corner Proposal Network for Anchor-free, Two-stage Object Detection	✓ Link	49.2	67.3	53.7	31.0	51.9	62.4					CPNDet (Hourglass-104, multi-scale)	2020-07-27
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	49	67.6	53.5	29.7	52.4	61.4	3G				GFLV2 (ResNeXt-101, 32x4d, DCN)	2020-11-25
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection	✓ Link	48.9	69.3	52.5	30.8	51.5	62.1					aLRP Loss (ResNext-101-64x4d, DCN, single scale)	2020-09-28
PP-YOLOE: An evolved version of YOLO	✓ Link	48.9	66.5	53.0	28.6	52.9	63.8					PP-YOLOE-m(CSPRepResNet-m, 640x640, single-scale )	2022-03-30
USB: Universal-Scale Object Detection Benchmark	✓ Link	48.8	67.5	53.0	30.1	52.3	61.1					UniverseNet-20.08 (Res2Net-50, DCN, single-scale)	2021-03-25
SOLQ: Segmenting Objects by Learning Queries	✓ Link	48.7										SOLQ (ResNet101, single scale)	2021-06-04
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	48.6	68.4	52.5	32	52.3	62					RetinaNet (SpineNet-96, 1024x1024)	2019-12-10
Scale-Aware Trident Networks for Object Detection	✓ Link	48.4	69.7	53.5	31.8	51.3	60.3					TridentNet (ResNet-101-Deformable, Image Pyramid)	2019-01-07
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	✓ Link	48.4	67.6	52.7						54.8G		GCNet (ResNeXt-101 + DCN + cascade + GC r4)	2019-04-25
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	48.3	66.5	52.8	28.8	51.9	60.7	3G				GFLV2 (ResNet-101-DCN)	2020-11-25
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields	✓ Link	48.23										Swin-S (RPE w/ GAB)	2023-05-08
Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection	✓ Link	48.2	67.4	52.6	29.2	51.7	60.2					GFL (X-101-32x4d-DCN, single-scale)	2020-06-08
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	48.1			28.7	50.4	61.5					ISTR (ResNet101-FPN-3x, single-scale)	2021-05-03
YOLOX: Exceeding YOLO Series in 2021	✓ Link	48.0										YOLOX-Darknet53(Darknet53, 640x640, single-scale)	2021-07-18
Vision Transformer with Deformable Attention	✓ Link	47.9	69.6	51.2	32.3	51.8	63.4					DAT-S (RetinaNet)	2022-01-03
Matrix Nets: A New Deep Architecture for Object Detection	✓ Link	47.8	66.2	52.3	29.7	50.4	60.7					MatrixNet Corners (ResNet-152, multi-scale)	2019-08-13
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection	✓ Link	47.8	68.4	51.1	30.2	50.8	59.1					aLRP Loss (ResNext-101-64x4d, single scale)	2020-09-28
SOLQ: Segmenting Objects by Learning Queries	✓ Link	47.8										SOLQ (ResNet50, single scale)	2021-06-04
Dynamic Head: Unifying Object Detection Heads with Attentions	✓ Link	47.7	65.7	51.9								DyHead (ResNeXt-64x4d-101)	2021-06-15
Path Aggregation Network for Instance Segmentation	✓ Link	47.4	67.2	51.8	30.1	51.7	60.0					PANet (ResNeXt-101, multi-scale)	2018-03-05
Soft Anchor-Point Object Detection	✓ Link	47.4	67.4	51.1	28.1	50.3	61.5					SAPD (ResNeXt-101, single-scale)	2019-11-27
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	47.3	65.9	51.2	28.0	49.7	59.8	15G		71.7G		HTC (HRNetV2p-W48)	2019-08-20
Hybrid Task Cascade for Instance Segmentation	✓ Link	47.1	63.9	44.7	22.8	43.9	54.6					HTC (ResNeXt-101-FPN)	2019-01-22
CenterNet: Keypoint Triplets for Object Detection	✓ Link	47.0	64.5	50.7	28.9	49.9	58.9					CenterNet511 (Hourglass-104, multi-scale)	2019-04-17
Multiple Anchor Learning for Visual Object Detection	✓ Link	47.0										MAL (ResNeXt101, multi-scale)	2019-12-04
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	46.8										ISTR (ResNet50-FPN-3x)	2021-05-03
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	46.7	66.3	50.6	29.1	50.1	61.7					RetinaNet (SpineNet-49, 896x896)	2019-12-10
RepPoints: Point Set Representation for Object Detection	✓ Link	46.5	67.4	50.9	30.3	49.7	57.1					RPDet (ResNet-101-DCN, multi-scale)	2019-04-25
HoughNet: Integrating near and long-range evidence for bottom-up object detection	✓ Link	46.4	65.1	50.7	29.1	48.5	58.1					HoughNet (MS)	2020-07-05
Reducing Label Noise in Anchor-Free Object Detection	✓ Link	46.3	64.8	51.6	31.4	49.9	56.4					PPDet (ResNeXt-101-FPN, multiscale)	2020-08-03
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	46.2	64.3	50.5	27.8	49.9	57					GFLV2 (ResNet-101)	2020-11-25
SNIPER: Efficient Multi-Scale Training	✓ Link	46.1	67.0	51.6	29.6	48.9	58.1	29G				SNIPER (ResNet-101)	2018-05-23
NAS-FCOS: Fast Neural Architecture Search for Object Detection	✓ Link	46.1										ResNeXt-64x4d-101 NAS-FCOS @128-256 w/improvements	2019-06-11
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	46.1	64.0	50.3	27.1	48.6	58.3	15G		61.8G		Mask R-CNN (HRNetV2p-W48 + cascade)	2019-08-20
Deformable ConvNets v2: More Deformable, Better Results	✓ Link	46.0	67.9	50.8	27.8	49.1	59.5					DCNv2 (ResNet-101, multi-scale)	2018-11-27
Localization Uncertainty Estimation for Anchor-Free Object Detection		46										Gaussian-FCOS	2020-06-28
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting	✓ Link	45.9	64.2	50	26.3	49	58.6					Cascade R-CNN-FPN (ResNet-101, map-guided)	2019-08-21
Multiple Anchor Learning for Visual Object Detection	✓ Link	45.9										MAL (ResNeXt101, single-scale)	2019-12-04
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	45.8	64.5		27.8	48.3	57.6					CenterMask+VoVNetV2-99 (single-scale)	2019-11-15
An Analysis of Scale Invariance in Object Detection - SNIP		45.7	67.3	51.1	29.3	48.8	57.1					D-RFCN + SNIP (DPN-98 with flip, multi-scale)	2017-11-22
Scaled-YOLOv4: Scaling Cross Stage Partial Network	✓ Link	45.5	64.1	49.5	27	49	56.7					YOLOv4 (CD53)	2020-11-16
Attention-guided Context Feature Pyramid Network for Object Detection	✓ Link	45	64.4	49	26.9	47.7	56.6					AC-FPN Cascade R-CNN(ResNet-101, single scale)	2020-05-23
FreeAnchor: Learning to Match Anchors for Visual Object Detection	✓ Link	44.8	64.3	48.4	27	47.9	56					FreeAnchor (ResNeXt-101)	2019-09-05
FCOS: Fully Convolutional One-Stage Object Detection	✓ Link	44.7	64.1	48.4	27.6	47.5	55.6					FCOS (ResNeXt-64x4d-101-FPN 4 + improvements)	2019-04-02
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	44.7	63.1	48.6	27.1		55.9					CenterMask+VoVNet2-57 (single-scale)	2019-11-15
Feature Selective Anchor-Free Module for Single-Shot Object Detection	✓ Link	44.6	65.2	48.6	29.7	47.1	54.6					FSAF (ResNeXt-101, multi-scale)	2019-03-02
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	44.6	63.4	48.4		47.2						CenterMask + X-101-32x8d (single-scale)	2019-11-15
A Ranking-based, Balanced Loss Function Unifying Classification and Localisation in Object Detection	✓ Link	44.6	65.0	47.5	24.6	48.1	58.3					aLRP Loss (ResNext-101, DCN, 500 scale)	2020-09-28
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	44.3	63.8	47.6	25.9	47.7	61.1					RetinaNet (SpineNet-49, 640x640)	2019-12-10
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection	✓ Link	44.3	62.3	48.5	26.8	47.7	54.1					GFLV2 (ResNet-50)	2020-11-25
You Only Look One-level Feature	✓ Link	44.3	62.9	47.5	24.0	48.5	60.4					YOLOF-DC5	2021-03-17
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network	✓ Link	44.2	64.6	49.3	29.2	47.9	55.1	34G				M2Det (VGG-16, multi-scale)	2018-11-12
Feature Intertwiner for Object Detection	✓ Link	44.2	67.5	51.1	27.2	50.3	57.7					InterNet (ResNet-101-FPN, multi-scale)	2019-03-28
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network	✓ Link	43.9	64.4	48	29.6	49.6	54.3	27G				M2Det (ResNet-101, multi-scale)	2018-11-12
FoveaBox: Beyond Anchor-based Object Detector	✓ Link	43.9	63.5	47.7	26.8	46.9	55.6					FoveaBox (ResNeXt-101)	2019-04-08
LIP: Local Importance-based Pooling	✓ Link	43.9	65.7	48.1	25.4	46.7	56.3					Faster R-CNN (LIP-ResNet-101-MD w FPN)	2019-08-12
Learning Spatial Fusion for Single-Shot Object Detection	✓ Link	43.9	64.1	49.2	27.0	46.6	53.4					YOLOv3 @800 + ASFF* (Darknet-53)	2019-11-21
Bottom-up Object Detection by Grouping Extreme and Center Points	✓ Link	43.7	60.5	47.0	24.1	46.9	57.6	180G				ExtremeNet (Hourglass-104, multi-scale)	2019-01-23
SNIPER: Efficient Multi-Scale Training	✓ Link	43.5	65.0	48.6	26.1	46.3	56.0	29G				SNIPER (ResNet-50)	2018-05-23
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	43.5		46.5	22.2		57.8	16G		21.7G		CenterNet (HRNetV2-W48)	2019-08-20
YOLOv4: Optimal Speed and Accuracy of Object Detection	✓ Link	43.5	65.7	47.3	26.7	46.7	53.3					YOLOv4-608	2020-04-23
An Analysis of Scale Invariance in Object Detection - SNIP		43.4	65.5	48.4	27.2	46.5	54.9					D-RFCN + SNIP (ResNet-101, multi-scale)	2017-11-22
Grid R-CNN	✓ Link	43.2	63.0	46.6	25.1	46.5	55.2					Grid R-CNN (ResNeXt-101-FPN)	2018-11-29
FCOS: Fully Convolutional One-Stage Object Detection	✓ Link	43.2	62.8	46.6	26.5	46.2	53.3					FCOS (ResNeXt-101-64x4d-FPN)	2019-04-02
CornerNet-Lite: Efficient Keypoint Based Object Detection	✓ Link	43.2			24.4	44.6	57.3					CornerNet-Saccade (Hourglass-104, multi-scale)	2019-04-18
PP-YOLOE: An evolved version of YOLO	✓ Link	43.1	60.5	46.6	23.2	46.4	56.9					PP-YOLOE-s(CSPRepResNet-s, 640x640, single-scale )	2022-03-30
Libra R-CNN: Towards Balanced Learning for Object Detection	✓ Link	43.0	64	47	25.3	45.6	54.6					Libra R-CNN (ResNeXt-101-FPN)	2019-04-04
Dynamic Head: Unifying Object Detection Heads with Attentions	✓ Link	43	60.7	46.8								DyHead (ResNet-50)	2021-06-15
Cascade R-CNN: Delving into High Quality Object Detection	✓ Link	42.8	62.1	46.3	23.7	45.5	55.2					Cascade R-CNN (ResNet-101-FPN+, cascade)	2017-12-03
RepPoints: Point Set Representation for Object Detection	✓ Link	42.8	65.0	46.3	24.9	46.2	54.7					RPDet (ResNet-101-DCN)	2019-04-25
Cascade R-CNN: High Quality Object Detection and Instance Segmentation	✓ Link	42.8	62.1	46.3	23.7	45.5	55.2	15G				Cascade R-CNN	2019-06-24
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	42.8	62.3	46.1	23.7	45.2	57.3					SpineNet-49 (640, RetinaNet, single-scale)	2019-12-10
Scale-Aware Trident Networks for Object Detection	✓ Link	42.7	63.6	46.5	23.9	46.6	56.6					TridentNet (ResNet-101)	2019-01-07
FCOS: Fully Convolutional One-Stage Object Detection	✓ Link	42.7	62.2	46.1	26.0	45.6	52.6					FCOS (ResNeXt-32x8d-101-FPN)	2019-04-02
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free	✓ Link	42.6	62.5	46.0	24.8	45.6	53.8	12G				RetinaMask (ResNeXt-101-FPN-GN)	2019-01-10
TOOD: Task-aligned One-stage Object Detection	✓ Link	42.5	60.3	46.4								TAL + TAP	2021-08-17
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	42.4	63.6	46.4	24.9	44.6	53.0	16G		20.8G		Faster R-CNN (HRNetV2p-W48)	2019-08-20
Hierarchical Shot Detector	✓ Link	42.3	61.2	46.9	22.8	47.3	55.9					HSD (Rest101, 768x768, single-scale test)	2019-10-01
CornerNet: Detecting Objects as Paired Keypoints	✓ Link	42.1	57.8	45.3	20.8	44.8	56.7					CornerNet511 (Hourglass-104, multi-scale)	2018-08-03
FoveaBox: Beyond Anchor-based Object Detector	✓ Link	42.1										FoveaBox (ResNeXt-101)	2019-04-08
FCOS: Fully Convolutional One-Stage Object Detection	✓ Link	42.0	60.4	45.3	25.4	45.0	51.0					FCOS (HRNet-W32-5l)	2019-04-02
FoveaBox: Beyond Anchor-based Object Detector	✓ Link	41.9										FoveaBox (ResNeXt-101)	2019-04-08
Single-Shot Refinement Neural Network for Object Detection	✓ Link	41.8	62.9	45.7	25.6	45.1	54.1					RefineDet512+ (ResNet-101)	2017-11-18
Gradient Harmonized Single-stage Detector	✓ Link	41.6	62.8	44.2	22.3	45.1	55.3					GHM-C + GHM-R (RetinaNet-FPN-ResNeXt-101)	2018-11-13
Objects as Points	✓ Link	41.6			21.5	43.9	56.0	26G				CenterNet-DLA (DLA-34, multi-scale)	2019-04-16
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	41.5	60.5	44.6	23.3	45	58					RetinaNet (SpineNet-49S, 640x640)	2019-12-10
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network	✓ Link	41.0	59.7	45	22.1	46.5	53.8	34G				M2Det (VGG-16, single-scale)	2018-11-12
RepPoints: Point Set Representation for Object Detection	✓ Link	41	62.9	44.3	23.6	44.1	51.7					RPDet (ResNet-101)	2019-04-25
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection	✓ Link	41.0							2.4		8.4	LeYOLO (Large@768)	2024-06-20
Feature Selective Anchor-Free Module for Single-Shot Object Detection	✓ Link	40.9	61.5	44	24	44.2	51.3	38G				FSAF (ResNet-101, single-scale)	2019-03-02
Focal Loss for Dense Object Detection	✓ Link	40.8	61.1	44.1	24.1	44.2	51.2	4G				RetinaNet (ResNeXt-101-FPN)	2017-08-07
Cascade R-CNN: Delving into High Quality Object Detection	✓ Link	40.6	59.9	44	22.6	42.7	52.1	12G				Cascade R-CNN (ResNet-50-FPN+, cascade)	2017-12-03
Acquisition of Localization Confidence for Accurate Object Detection	✓ Link	40.6										IoU-Net	2018-07-30
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution	✓ Link	40.6	58.9	44.5	22.0	42.8	52.6	5G				Faster R-CNN (Cascade RPN)	2019-09-15
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation	✓ Link	40.6			24.6	43.9	53.3					ResNet-50-DW-DPN (Deformable Kernels)	2019-10-07
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	40.5	59.3		23.4	42.6	51.0	16G		27.3G		FCOS (HRNetV2p-W48)	2019-08-20
Bounding Box Regression with Uncertainty for Accurate Object Detection	✓ Link	40.4										ResNet-50-FPN Mask R-CNN + KL Loss + var voting + soft-NMS	2018-09-23
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation	✓ Link	40.3	60.1	43	22.1	43.5	51.5					RDSNet (ResNet-101, RetinaNet, mask, MBRM)	2019-12-11
Bottom-up Object Detection by Grouping Extreme and Center Points	✓ Link	40.2	55.5	43.2	20.4	43.2	53.1	180G				ExtremeNet (Hourglass-104, single-scale)	2019-01-23
Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution	✓ Link	40.1	59.4	43.8	22.1	42.4	51.6	5G				Fast R-CNN (Cascade RPN)	2019-09-15
Cross-Iteration Batch Normalization	✓ Link	40.1	60.5	44.1	35.8	57.3	38.5					Mask R-CNN (ResNet-101-FPN, CBN)	2020-02-13
Mask R-CNN	✓ Link	39.8	62.3	43.4	22.1	43.2	51.2	9G				Mask R-CNN (ResNeXt-101-FPN)	2017-03-20
Region Proposal by Guided Anchoring	✓ Link	39.8	59.2	43.5	21.8	42.6	50.7					GA-Faster-RCNN	2019-01-10
NAS-FCOS: Fast Neural Architecture Search for Object Detection	✓ Link	39.8										ResNet-50 NAS-FCOS @256	2019-06-11
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	✓ Link	39.8										A2MIM (ResNet-50 2x)	2022-05-27
ChainerCV: a Library for Deep Learning in Computer Vision	✓ Link	39.5										FPN (ResNet101 backbone)	2017-08-28
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free	✓ Link	39.4	58.6	42.3	21.9	42.0	51.0	9G				RetinaMask (ResNet-50-FPN)	2019-01-10
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection	✓ Link	39.3									5.8	LeYOLO (Medium@640)	2024-06-20
Attention Augmented Convolutional Networks	✓ Link	39.2								24.5G		AA-ResNet-10 + RetinaNet	2019-04-22
Multiple Anchor Learning for Visual Object Detection	✓ Link	39.2										MAL (ResNet50, single-scale)	2019-12-04
Focal Loss for Dense Object Detection	✓ Link	39.1	59.1	42.3	21.8	42.7	50.2	4G				RetinaNet (ResNet-101-FPN)	2017-08-07
Cascade R-CNN: Delving into High Quality Object Detection	✓ Link	38.8	61.1	41.9	21.3	41.8	49.8	3G				Cascade R-CNN (ResNet-101-FPN+)	2017-12-03
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network	✓ Link	38.8	59.4	41.7	20.5	43.9	53.4	27G				M2Det (ResNet-101, single-scale)	2018-11-12
SaccadeNet: A Fast and Accurate Object Detector	✓ Link	38.5	55.6	41.4	19.2	42.1	50.6	46G				SaccadeNet (DLA-34-DCN)	2020-03-26
Mask R-CNN	✓ Link	38.2	60.3	41.7	20.1	41.1	50.2	9G				Mask R-CNN (ResNet-101-FPN)	2017-03-20
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection	✓ Link	38.2							1.9		4.51	LeYOLO (Small@640)	2024-06-20
Segmentation is All You Need		38.1										WSMA-Seg	2019-04-30
Compact Global Descriptor for Neural Networks	✓ Link	37.9										Faster R-CNN + FPN + CGD	2019-07-23
CornerNet: Detecting Objects as Paired Keypoints	✓ Link	37.8	53.7	40.1	17.0	39.0	50.5					CornerNet511 (Hourglass-52, single-scale)	2018-08-03
Single-Shot Refinement Neural Network for Object Detection	✓ Link	37.6	58.7	40.8	22.7	40.3	48.3					RefineDet512+ (VGG-16)	2017-11-18
Deformable Convolutional Networks	✓ Link	37.5	58.0		19.4	40.1	52.5					DeformConv-R-FCN (Aligned-Inception-ResNet)	2017-03-17
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era	✓ Link	37.4	58	40.1	17.5	41.1	51.2					Faster R-CNN (ImageNet+300M)	2017-07-10
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	36.9										Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	2020-11-25
Beyond Skip Connections: Top-Down Modulation for Object Detection	✓ Link	36.8										Faster R-CNN + TDM	2016-12-20
Cascade R-CNN: Delving into High Quality Object Detection	✓ Link	36.5	59	39.2	20.3	38.8	46.4	3G				Cascade R-CNN (ResNet-50-FPN+)	2017-12-03
Single-Shot Refinement Neural Network for Object Detection	✓ Link	36.4	57.5	39.5	16.6	39.9	51.4					RefineDet512 (ResNet-101)	2017-11-18
Feature Pyramid Networks for Object Detection	✓ Link	36.2						2G				Faster R-CNN + FPN	2016-12-09
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	35.9										Faster R-CNN (Bottleneck-injected ResNet-50 and FPN)	2020-11-25

OpenCodePapers

object-detection-on-coco