OpenCodePapers

instance-segmentation-on-coco

Instance Segmentation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemask APAP50AP75APSAPMAPLModelNameReleaseDate
DETRs with Collaborative Hybrid Assignments Training✓ Link57.180.263.441.660.172.0Co-DETR2022-11-22
CBNet: A Composite Backbone Network Architecture for Object Detection✓ Link56.180.362.139.759.370.9CBNetV2 (EVA02, single-scale)2021-07-01
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale✓ Link55.580.036.358.072.4EVA2022-11-14
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation✓ Link55.4FD-SwinV2-G2022-05-27
Mask Frozen-DETR: High Quality Instance Segmentation with One GPU55.379.361.437.858.470.4Mask Frozen-DETR2023-08-07
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks✓ Link54.8BEiT-32022-08-22
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation✓ Link54.7MasK DINO (SwinL, multi-scale)2022-06-06
Vision Transformer Adapter for Dense Predictions✓ Link54.5ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)2022-05-17
General Object Foundation Model for Images and Videos at Scale✓ Link54.5GLEE-Pro2023-12-14
Swin Transformer V2: Scaling Up Capacity and Resolution✓ Link54.4SwinV2-G (HTC++)2021-11-18
General Object Foundation Model for Images and Videos at Scale✓ Link53.3GLEE-Plus2023-12-14
End-to-End Semi-Supervised Object Detection with Soft Teacher✓ Link53.0Soft Teacher + Swin-L (HTC++, multi-scale)2021-06-16
Vision Transformer Adapter for Dense Predictions✓ Link53.0ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)2022-05-17
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation✓ Link52.8Mask DINO (SwinL, single -scale)2022-06-06
Vision Transformer Adapter for Dense Predictions✓ Link52.5ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)2022-05-17
CBNet: A Composite Backbone Network Architecture for Object Detection✓ Link52.3CBNetV2 (Dual-Swin-L HTC, multi-scale)2021-07-01
Universal Instance Perception as Object Discovery and Retrieval✓ Link51.876.256.733.355.967.5UNINEXT-H2023-03-12
CBNet: A Composite Backbone Network Architecture for Object Detection✓ Link51.6CBNetV2 (Dual-Swin-L HTC, single-scale)2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers✓ Link51.375.456.535.664.2Focal-L (HTC++, multi-scale)2021-07-01
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link51.1Swin-L (HTC++, multi scale)2021-03-25
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link50.574.954.929.153.871.2Mask2Former (Swin-L, single scale)2021-12-02
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link50.2Swin-L (HTC++, single scale)2021-03-25
ISTR: End-to-End Instance Segmentation with Transformers✓ Link49.7ISTR-SMT (Swin-L, single scale)2021-05-03
Instances as Queries✓ Link49.174.253.831.551.863.2QueryInst (single scale)2021-05-05
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation✓ Link49.1Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)2020-12-13
Exploring Target Representations for Masked Autoencoders✓ Link48.8dBOT ViT-L (CLIP)2022-09-08
MogaNet: Multi-order Gated Aggregation Network✓ Link48.8MogaNet-XL (Cascade Mask R-CNN)2022-11-07
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution✓ Link48.572.053.331.650.961.5DetectoRS (ResNeXt-101-64x4d, multi-scale)2020-06-03
Exploring Target Representations for Masked Autoencoders✓ Link48.3dBOT ViT-L2022-09-08
DiffusionInst: Diffusion Model for Instance Segmentation✓ Link48.3DiffusionInst-SwinL2022-12-06
General Object Foundation Model for Images and Videos at Scale✓ Link48.3GLEE-Lite2023-12-14
DiffusionInst: Diffusion Model for Instance Segmentation✓ Link47.6DiffusionInst-SwinB2022-12-06
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution✓ Link47.171.151.630.349.559.6DetectoRS (ResNeXt-101-32x4d, multi-scale)2020-06-03
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation✓ Link46.9Cascade Eff-B7 NAS-FPN (1280)2020-12-13
SOLQ: Segmenting Objects by Learning Queries✓ Link46.7SOLQ (Swin-L, single scale)2021-06-04
Exploring Target Representations for Masked Autoencoders✓ Link46.3dBOT ViT-B2022-09-08
Exploring Target Representations for Masked Autoencoders✓ Link46.2dBOT ViT-B (CLIP)2022-09-08
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization✓ Link46.1Mask R-CNN (SpineNet-190, 1536x1536)2019-12-10
MogaNet: Multi-order Gated Aggregation Network✓ Link46.1MogaNet-L (Cascade Mask R-CNN)2022-11-07
MogaNet: Multi-order Gated Aggregation Network✓ Link46MogaNet-B (Cascade Mask R-CNN)2022-11-07
A Tri-Layer Plugin to Improve Occluded Detection✓ Link45.9Swin-B + Cascade Mask R-CNN (tri-layer modelling)2022-10-18
Global Context Networks✓ Link45.468.949.6GCNet (ResNeXt-101 + DCN + cascade + GC r4)2020-12-24
MogaNet: Multi-order Gated Aggregation Network✓ Link45.1MogaNet-S (Cascade Mask R-CNN)2022-11-07
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window45.03gSwin-S2022-08-24
iBOT: Image BERT Pre-Training with Online Tokenizer✓ Link44.2iBOT (ViT-B/16)2021-11-15
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window44.16gSwin-T2022-08-24
MogaNet: Multi-order Gated Aggregation Network✓ Link44.1MogaNet-L (Mask R-CNN 1x)2022-11-07
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN✓ Link43.5A2MIM (ViT-B)2022-05-27
CBNet: A Novel Composite Backbone Network Architecture for Object Detection✓ Link43.3Cascade Mask R-CNN (ResNeXt152, CBNet)2019-09-09
MogaNet: Multi-order Gated Aggregation Network✓ Link43.2MogaNet-B (Mask R-CNN 1x)2022-11-07
ResNeSt: Split-Attention Networks✓ Link43%ResNeSt1012020-04-19
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window42.87gSwin-VT2022-08-24
iBOT: Image BERT Pre-Training with Online Tokenizer✓ Link42.6iBOT (ViT-S/16)2021-11-15
Mask Transfiner for High-Quality Instance Segmentation✓ Link42.2Mask Transfiner(ResNet101-FPN)2021-11-26
MogaNet: Multi-order Gated Aggregation Network✓ Link42.2MogaNet-S (Mask R-CNN 1x)2022-11-07
Path Aggregation Network for Instance Segmentation✓ Link42.0PANet2018-03-05
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link41.824.444.454.3CenterMask + VoVNet992019-11-15
SOLOv2: Dynamic and Fast Instance Segmentation✓ Link41.763.245.118.045.061.6SOLOv2(Res-DCN-101-FPN)2020-03-23
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers✓ Link41.7BCNet(ResNeXt-101 + FPN+ FCOS)2021-03-23
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond✓ Link41.5%GCNet (ResNeXt-101 + DCN + cascade + GC r16)2019-04-25
DiffusionInst: Diffusion Model for Instance Segmentation✓ Link41.5DiffusionInst-ResNet1012022-12-06
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation✓ Link41.363.144.622.744.154.5BlendMask (ResNet-101 + DCN interval=3)2020-01-02
Hybrid Task Cascade for Instance Segmentation✓ Link41.2HTC + ResNeXt-101-FPN + DCN2019-01-22
Hybrid Task Cascade for Instance Segmentation✓ Link41.2%HTC + ResNeXt-101-FPN2019-01-22
SOLQ: Segmenting Objects by Learning Queries✓ Link40.9SOLQ (ResNet101, single scale)2021-06-04
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection✓ Link40.8%VoVNetV1-572019-04-22
K-Net: Towards Unified Image Segmentation✓ Link40.6%63.318.843.359K-Net-N256 (ResNet-101)2021-06-28
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link40.662.344.120.142.857.0CenterMask + VoVNetV2-99 (single-scale)2019-11-15
SOLO: Segmenting Objects by Locations✓ Link40.462.743.317.643.358.9SOLO(Res-DCN-101-FPN)2019-12-10
SOLO: Segmenting Objects by Locations✓ Link40.4%62.7%43.3%17.6%43.3%58.9%SOLO (ResNet-DCN-101-FPN)2019-12-10
D2Det: Towards High Quality Object Detection and Instance Segmentation✓ Link40.261.543.721.743.054.0D2Det (ResNet-101, single-scale test)2020-06-01
K-Net: Towards Unified Image Segmentation✓ Link40.1%62.818.742.758.8K-Net (ResNet-101)2021-06-28
ISTR: End-to-End Instance Segmentation with Transformers✓ Link39.9%22.841.952.3ISTR (ResNet101-FPN-3x, single-scale)2021-05-03
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers✓ Link39.861.543.122.742.451.1BCNet(ResNet-101-FPN + Faster RCNN)2021-03-23
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection✓ Link39.7%VoVNetV1-392019-04-22
SOLQ: Segmenting Objects by Learning Queries✓ Link39.7SOLQ (ResNet50, single scale)2021-06-04
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link39.661.242.919.7CenterMask + X101-32x8d (single-scale)2019-11-15
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers✓ Link39.661.242.722.342.351.0BCNet(ResNet-101-FPN + FCOS)2021-03-23
Mask Scoring R-CNN✓ Link39.6%MS R-CNN + ResNet-101 DCN + FPN 2019-03-01
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting✓ Link39.5%61.4%42.9%21.2%42.5%52.1%Cascade R-CNN (ResNet-101-FPN, map-guided)2019-08-21
Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation✓ Link39.260.842.222.241.850.1CPMask2020-07-24
MogaNet: Multi-order Gated Aggregation Network✓ Link39.1MogaNet-T (Mask R-CNN 1x)2022-11-07
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond✓ Link38.764.140.022.240.252.0PolarMask++ (ResNeXt-101-DCN)2021-05-05
ISDA: Position-Aware Instance Segmentation with Deformable Attention✓ Link38.76241.11741.2ISDA (ours)2022-02-23
ISTR: End-to-End Instance Segmentation with Transformers✓ Link38.6%22.140.450.6ISTR (ResNet50-FPN-3x, single-scale)2021-05-03
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link38.3CenterMask + ResNet-101-FPN2019-11-15
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation✓ Link38.160.240.817.840.854.3SipMask (ResNet-101, single-scale test)2020-07-29
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features38.1%MaskLab+ (ResNet-101, JFT)2017-12-13
EmbedMask: Embedding Coupling for One-stage Instance Segmentation✓ Link37.7%59.1%40.3%17.9%40.4%53%EmbedMask (ResNet-101-FPN)2019-12-04
EmbedMask: Embedding Coupling for One-stage Instance Segmentation✓ Link37.759.140.317.940.4EmbedMask(R-101-FPN)2019-12-04
MogaNet: Multi-order Gated Aggregation Network✓ Link37.6MogaNet-XT2022-11-07
TensorMask: A Foundation for Dense Object Segmentation✓ Link37.3%TensorMask (ResNet-101-FPN)2019-03-28
Mask R-CNN✓ Link37.160.039.416.939.953.5Mask R-CNN (ResNeXt-101-FPN)2017-03-20
DiffusionInst: Diffusion Model for Instance Segmentation✓ Link37.1DiffusionInst-ResNet502022-12-06
VirTex: Learning Visual Representations from Textual Annotations✓ Link36.958.439.7VirTex Mask R-CNN (ResNet-50-FPN)2020-06-11
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation✓ Link36.4%57.9%39.0%16.4%39.5%51.6%RDSNet (data aug)2019-12-11
MogaNet: Multi-order Gated Aggregation Network✓ Link35.8MogaNet-T2022-11-07
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN✓ Link34.9A2MIM (ResNet-50 2x)2022-05-27
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation✓ Link33.852.935.9E2EC DLA-342022-03-08
Fully Convolutional Instance-aware Semantic Segmentation✓ Link33.6%54.5%FCIS+++ +OHEM2016-11-23
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation✓ Link33.6Mask R-CNN (Bottleneck-injected ResNet-50, FPN)2020-11-25
PolarMask: Single Shot Instance Segmentation with Polar Representation✓ Link32.9%55.4%33.8%15.5%35.1%46.3%PolarMask (ResNeXt-101-FPN)2019-09-29
PolarMask: Single Shot Instance Segmentation with Polar Representation✓ Link30.4%51.9%31%13.4%32.4%42.8%PolarMask (ResNet-101-FPN)2019-09-29
YOLACT: Real-time Instance Segmentation✓ Link29.8%YOLACT (ResNet-50-FPN)2019-04-04
Fully Convolutional Instance-aware Semantic Segmentation✓ Link29.2%49.5%7.1%31.3%50.0%FCIS +OHEM2016-11-23
A MultiPath Network for Object Detection✓ Link25.0%MultiPath Network2016-04-07
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link80.862.241.058.970.3InternImage-H2022-11-10
ResNeSt: Split-Attention Networks✓ Link70.251.530.049.660.6ResNeSt-200 (multi-scale)2020-04-19
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link66.247.427.2CenterMask + VoVNetV2-99 (multi-scale)2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation✓ Link60.819.441.7CenterMask + VoVNetV2-57 (single-scale)2019-11-15
Instance-aware Semantic Segmentation via Multi-task Network Cascades✓ Link44.3%MNC2015-12-14
ISDA: Position-Aware Instance Segmentation with Deformable Attention✓ Link55.7ISDA (ResNet-50)2022-02-23