object-detection-on-coco-2017

Object Detection

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	AP	mAP	Mean mAP	AP50	AP75	APM	APM50	APM75	ModelName	ReleaseDate
MaxViT: Multi-Axis Vision Transformer	✓ Link	53.4			72.9	58.1	45.7	70.3	50	MaxViT-B	2022-04-04
MaxViT: Multi-Axis Vision Transformer	✓ Link	53.1			72.5	58.1	45.4	69.8	49.5	MaxViT-S	2022-04-04
MaxViT: Multi-Axis Vision Transformer	✓ Link	52.1			71.9	56.8	44.6	69.1	48.4	MaxViT-T	2022-04-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	50.2								DAT-S++	2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	49.2								DAT-T++	2023-09-04
Stochastic Subsampling With Average Pooling		42.1			59.4	45.9				DyHead (SAP)	2024-09-25
On the Ideal Number of Groups for Isometric Gradient Propagation		40.7			61.2	44.6				Faster R-CNN (ideal number of groups)	2023-02-07
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		56.4							UniRepLKNet-XL++	2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		55.8							UniRepLKNet-L++	2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		54.8							UniRepLKNet-B++	2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		54.3							UniRepLKNet-S++	2023-11-27
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	✓ Link		54.1							MixMIM-L	2022-05-26
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		53							UniRepLKNet-S	2023-11-27
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	✓ Link		52.2							MixMIM-B	2022-05-26
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link		51.7							UniRepLKNet-T	2023-11-27
BiFormer: Vision Transformer with Bi-Level Routing Attention	✓ Link		48.6							BiFormer-B (IN1k pretrain, MaskRCNN 12ep)	2023-03-15
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link		48.5							DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)	2024-10-11
BiFormer: Vision Transformer with Bi-Level Routing Attention	✓ Link		47.8							BiFormer-S (IN1k pretrain, MaskRCNN 12ep)	2023-03-15
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link		47.5							DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)	2024-10-11
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link		47.1							DeBiFormer-B (IN1k pretrain, Retina)	2024-10-11
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link		45.6							DeBiFormer-S (IN1k pretrain, Retina)	2024-10-11
YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective			35.45							YOLO-Drone	2023-04-14
Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection	✓ Link			3153						retinanet	2019-12-19
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction	✓ Link			4.2						Lpixel	2021-08-09

OpenCodePapers

object-detection-on-coco-2017