OpenCodePapers

object-detection-on-coco-2017

Object Detection
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAPmAPMean mAPAP50AP75APMAPM50APM75ModelNameReleaseDate
MaxViT: Multi-Axis Vision Transformer✓ Link53.472.958.145.770.350MaxViT-B2022-04-04
MaxViT: Multi-Axis Vision Transformer✓ Link53.172.558.145.469.849.5MaxViT-S2022-04-04
MaxViT: Multi-Axis Vision Transformer✓ Link52.171.956.844.669.148.4MaxViT-T2022-04-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention✓ Link50.2DAT-S++2023-09-04
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention✓ Link49.2DAT-T++2023-09-04
Stochastic Subsampling With Average Pooling42.159.445.9DyHead (SAP)2024-09-25
On the Ideal Number of Groups for Isometric Gradient Propagation40.761.244.6Faster R-CNN (ideal number of groups)2023-02-07
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link56.4UniRepLKNet-XL++2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link55.8UniRepLKNet-L++2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link54.8UniRepLKNet-B++2023-11-27
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link54.3UniRepLKNet-S++2023-11-27
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link54.1MixMIM-L2022-05-26
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link53UniRepLKNet-S2023-11-27
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link52.2MixMIM-B2022-05-26
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition✓ Link51.7UniRepLKNet-T2023-11-27
BiFormer: Vision Transformer with Bi-Level Routing Attention✓ Link48.6BiFormer-B (IN1k pretrain, MaskRCNN 12ep)2023-03-15
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention✓ Link48.5DeBiFormer-B (IN1k pretrain, MaskRCNN 12ep)2024-10-11
BiFormer: Vision Transformer with Bi-Level Routing Attention✓ Link47.8BiFormer-S (IN1k pretrain, MaskRCNN 12ep)2023-03-15
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention✓ Link47.5DeBiFormer-S (IN1k pretrain, MaskRCNN 12ep)2024-10-11
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention✓ Link47.1DeBiFormer-B (IN1k pretrain, Retina)2024-10-11
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention✓ Link45.6DeBiFormer-S (IN1k pretrain, Retina)2024-10-11
YOLO-Drone:Airborne real-time detection of dense small objects from high-altitude perspective35.45YOLO-Drone2023-04-14
Benchmark for Generic Product Detection: A Low Data Baseline for Dense Object Detection✓ Link3153retinanet2019-12-19
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction✓ Link4.2Lpixel2021-08-09