instance-segmentation-on-coco

Instance Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mask AP	AP50	AP75	APS	APM	APL	ModelName	ReleaseDate
DETRs with Collaborative Hybrid Assignments Training	✓ Link	57.1	80.2	63.4	41.6	60.1	72.0	Co-DETR	2022-11-22
CBNet: A Composite Backbone Network Architecture for Object Detection	✓ Link	56.1	80.3	62.1	39.7	59.3	70.9	CBNetV2 (EVA02, single-scale)	2021-07-01
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	✓ Link	55.5	80.0		36.3	58.0	72.4	EVA	2022-11-14
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation	✓ Link	55.4						FD-SwinV2-G	2022-05-27
Mask Frozen-DETR: High Quality Instance Segmentation with One GPU		55.3	79.3	61.4	37.8	58.4	70.4	Mask Frozen-DETR	2023-08-07
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks	✓ Link	54.8						BEiT-3	2022-08-22
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation	✓ Link	54.7						MasK DINO (SwinL, multi-scale)	2022-06-06
Vision Transformer Adapter for Dense Predictions	✓ Link	54.5						ViT-Adapter-L (HTC++, BEiTv2, O365, multi-scale)	2022-05-17
General Object Foundation Model for Images and Videos at Scale	✓ Link	54.5						GLEE-Pro	2023-12-14
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	54.4						SwinV2-G (HTC++)	2021-11-18
General Object Foundation Model for Images and Videos at Scale	✓ Link	53.3						GLEE-Plus	2023-12-14
End-to-End Semi-Supervised Object Detection with Soft Teacher	✓ Link	53.0						Soft Teacher + Swin-L (HTC++, multi-scale)	2021-06-16
Vision Transformer Adapter for Dense Predictions	✓ Link	53.0						ViT-Adapter-L (HTC++, BEiTv2 pretrain, multi-scale)	2022-05-17
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation	✓ Link	52.8						Mask DINO (SwinL, single -scale)	2022-06-06
Vision Transformer Adapter for Dense Predictions	✓ Link	52.5						ViT-Adapter-L (HTC++, BEiT pretrain, multi-scale)	2022-05-17
CBNet: A Composite Backbone Network Architecture for Object Detection	✓ Link	52.3						CBNetV2 (Dual-Swin-L HTC, multi-scale)	2021-07-01
Universal Instance Perception as Object Discovery and Retrieval	✓ Link	51.8	76.2	56.7	33.3	55.9	67.5	UNINEXT-H	2023-03-12
CBNet: A Composite Backbone Network Architecture for Object Detection	✓ Link	51.6						CBNetV2 (Dual-Swin-L HTC, single-scale)	2021-07-01
Focal Self-attention for Local-Global Interactions in Vision Transformers	✓ Link	51.3	75.4	56.5	35.6		64.2	Focal-L (HTC++, multi-scale)	2021-07-01
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	51.1						Swin-L (HTC++, multi scale)	2021-03-25
Masked-attention Mask Transformer for Universal Image Segmentation	✓ Link	50.5	74.9	54.9	29.1	53.8	71.2	Mask2Former (Swin-L, single scale)	2021-12-02
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	50.2						Swin-L (HTC++, single scale)	2021-03-25
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	49.7						ISTR-SMT (Swin-L, single scale)	2021-05-03
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation	✓ Link	49.1						Cascade Eff-B7 NAS-FPN (1280, self-training Copy Paste, single-scale)	2020-12-13
Instances as Queries	✓ Link	49.1	74.2	53.8	31.5	51.8	63.2	QueryInst (single scale)	2021-05-05
Exploring Target Representations for Masked Autoencoders	✓ Link	48.8						dBOT ViT-L (CLIP)	2022-09-08
MogaNet: Multi-order Gated Aggregation Network	✓ Link	48.8						MogaNet-XL (Cascade Mask R-CNN)	2022-11-07
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution	✓ Link	48.5	72.0	53.3	31.6	50.9	61.5	DetectoRS (ResNeXt-101-64x4d, multi-scale)	2020-06-03
Exploring Target Representations for Masked Autoencoders	✓ Link	48.3						dBOT ViT-L	2022-09-08
DiffusionInst: Diffusion Model for Instance Segmentation	✓ Link	48.3						DiffusionInst-SwinL	2022-12-06
General Object Foundation Model for Images and Videos at Scale	✓ Link	48.3						GLEE-Lite	2023-12-14
DiffusionInst: Diffusion Model for Instance Segmentation	✓ Link	47.6						DiffusionInst-SwinB	2022-12-06
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution	✓ Link	47.1	71.1	51.6	30.3	49.5	59.6	DetectoRS (ResNeXt-101-32x4d, multi-scale)	2020-06-03
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation	✓ Link	46.9						Cascade Eff-B7 NAS-FPN (1280)	2020-12-13
SOLQ: Segmenting Objects by Learning Queries	✓ Link	46.7						SOLQ (Swin-L, single scale)	2021-06-04
Exploring Target Representations for Masked Autoencoders	✓ Link	46.3						dBOT ViT-B	2022-09-08
Exploring Target Representations for Masked Autoencoders	✓ Link	46.2						dBOT ViT-B (CLIP)	2022-09-08
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	46.1						Mask R-CNN (SpineNet-190, 1536x1536)	2019-12-10
MogaNet: Multi-order Gated Aggregation Network	✓ Link	46.1						MogaNet-L (Cascade Mask R-CNN)	2022-11-07
MogaNet: Multi-order Gated Aggregation Network	✓ Link	46						MogaNet-B (Cascade Mask R-CNN)	2022-11-07
A Tri-Layer Plugin to Improve Occluded Detection	✓ Link	45.9						Swin-B + Cascade Mask R-CNN (tri-layer modelling)	2022-10-18
Global Context Networks	✓ Link	45.4	68.9	49.6				GCNet (ResNeXt-101 + DCN + cascade + GC r4)	2020-12-24
MogaNet: Multi-order Gated Aggregation Network	✓ Link	45.1						MogaNet-S (Cascade Mask R-CNN)	2022-11-07
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		45.03						gSwin-S	2022-08-24
iBOT: Image BERT Pre-Training with Online Tokenizer	✓ Link	44.2						iBOT (ViT-B/16)	2021-11-15
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		44.16						gSwin-T	2022-08-24
MogaNet: Multi-order Gated Aggregation Network	✓ Link	44.1						MogaNet-L (Mask R-CNN 1x)	2022-11-07
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	✓ Link	43.5						A2MIM (ViT-B)	2022-05-27
CBNet: A Novel Composite Backbone Network Architecture for Object Detection	✓ Link	43.3						Cascade Mask R-CNN (ResNeXt152, CBNet)	2019-09-09
MogaNet: Multi-order Gated Aggregation Network	✓ Link	43.2						MogaNet-B (Mask R-CNN 1x)	2022-11-07
ResNeSt: Split-Attention Networks	✓ Link	43%						ResNeSt101	2020-04-19
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		42.87						gSwin-VT	2022-08-24
iBOT: Image BERT Pre-Training with Online Tokenizer	✓ Link	42.6						iBOT (ViT-S/16)	2021-11-15
Mask Transfiner for High-Quality Instance Segmentation	✓ Link	42.2						Mask Transfiner(ResNet101-FPN)	2021-11-26
MogaNet: Multi-order Gated Aggregation Network	✓ Link	42.2						MogaNet-S (Mask R-CNN 1x)	2022-11-07
Path Aggregation Network for Instance Segmentation	✓ Link	42.0						PANet	2018-03-05
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	41.8			24.4	44.4	54.3	CenterMask + VoVNet99	2019-11-15
SOLOv2: Dynamic and Fast Instance Segmentation	✓ Link	41.7	63.2	45.1	18.0	45.0	61.6	SOLOv2(Res-DCN-101-FPN)	2020-03-23
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers	✓ Link	41.7						BCNet(ResNeXt-101 + FPN+ FCOS)	2021-03-23
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond	✓ Link	41.5%						GCNet (ResNeXt-101 + DCN + cascade + GC r16)	2019-04-25
DiffusionInst: Diffusion Model for Instance Segmentation	✓ Link	41.5						DiffusionInst-ResNet101	2022-12-06
BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation	✓ Link	41.3	63.1	44.6	22.7	44.1	54.5	BlendMask (ResNet-101 + DCN interval=3)	2020-01-02
Hybrid Task Cascade for Instance Segmentation	✓ Link	41.2						HTC + ResNeXt-101-FPN + DCN	2019-01-22
Hybrid Task Cascade for Instance Segmentation	✓ Link	41.2%						HTC + ResNeXt-101-FPN	2019-01-22
SOLQ: Segmenting Objects by Learning Queries	✓ Link	40.9						SOLQ (ResNet101, single scale)	2021-06-04
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection	✓ Link	40.8%						VoVNetV1-57	2019-04-22
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	40.6	62.3	44.1	20.1	42.8	57.0	CenterMask + VoVNetV2-99 (single-scale)	2019-11-15
K-Net: Towards Unified Image Segmentation	✓ Link	40.6%	63.3		18.8	43.3	59	K-Net-N256 (ResNet-101)	2021-06-28
SOLO: Segmenting Objects by Locations	✓ Link	40.4	62.7	43.3	17.6	43.3	58.9	SOLO(Res-DCN-101-FPN)	2019-12-10
SOLO: Segmenting Objects by Locations	✓ Link	40.4%	62.7%	43.3%	17.6%	43.3%	58.9%	SOLO (ResNet-DCN-101-FPN)	2019-12-10
D2Det: Towards High Quality Object Detection and Instance Segmentation	✓ Link	40.2	61.5	43.7	21.7	43.0	54.0	D2Det (ResNet-101, single-scale test)	2020-06-01
K-Net: Towards Unified Image Segmentation	✓ Link	40.1%	62.8		18.7	42.7	58.8	K-Net (ResNet-101)	2021-06-28
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	39.9%			22.8	41.9	52.3	ISTR (ResNet101-FPN-3x, single-scale)	2021-05-03
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers	✓ Link	39.8	61.5	43.1	22.7	42.4	51.1	BCNet(ResNet-101-FPN + Faster RCNN)	2021-03-23
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection	✓ Link	39.7%						VoVNetV1-39	2019-04-22
SOLQ: Segmenting Objects by Learning Queries	✓ Link	39.7						SOLQ (ResNet50, single scale)	2021-06-04
Mask Scoring R-CNN	✓ Link	39.6%						MS R-CNN + ResNet-101 DCN + FPN	2019-03-01
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	39.6	61.2	42.9	19.7			CenterMask + X101-32x8d (single-scale)	2019-11-15
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers	✓ Link	39.6	61.2	42.7	22.3	42.3	51.0	BCNet(ResNet-101-FPN + FCOS)	2021-03-23
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting	✓ Link	39.5%	61.4%	42.9%	21.2%	42.5%	52.1%	Cascade R-CNN (ResNet-101-FPN, map-guided)	2019-08-21
Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation	✓ Link	39.2	60.8	42.2	22.2	41.8	50.1	CPMask	2020-07-24
MogaNet: Multi-order Gated Aggregation Network	✓ Link	39.1						MogaNet-T (Mask R-CNN 1x)	2022-11-07
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond	✓ Link	38.7	64.1	40.0	22.2	40.2	52.0	PolarMask++ (ResNeXt-101-DCN)	2021-05-05
ISDA: Position-Aware Instance Segmentation with Deformable Attention	✓ Link	38.7	62	41.1	17	41.2		ISDA (ours)	2022-02-23
ISTR: End-to-End Instance Segmentation with Transformers	✓ Link	38.6%			22.1	40.4	50.6	ISTR (ResNet50-FPN-3x, single-scale)	2021-05-03
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link	38.3						CenterMask + ResNet-101-FPN	2019-11-15
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features		38.1%						MaskLab+ (ResNet-101, JFT)	2017-12-13
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation	✓ Link	38.1	60.2	40.8	17.8	40.8	54.3	SipMask (ResNet-101, single-scale test)	2020-07-29
EmbedMask: Embedding Coupling for One-stage Instance Segmentation	✓ Link	37.7%	59.1%	40.3%	17.9%	40.4%	53%	EmbedMask (ResNet-101-FPN)	2019-12-04
EmbedMask: Embedding Coupling for One-stage Instance Segmentation	✓ Link	37.7	59.1	40.3	17.9	40.4		EmbedMask(R-101-FPN)	2019-12-04
MogaNet: Multi-order Gated Aggregation Network	✓ Link	37.6						MogaNet-XT	2022-11-07
TensorMask: A Foundation for Dense Object Segmentation	✓ Link	37.3%						TensorMask (ResNet-101-FPN)	2019-03-28
Mask R-CNN	✓ Link	37.1	60.0	39.4	16.9	39.9	53.5	Mask R-CNN (ResNeXt-101-FPN)	2017-03-20
DiffusionInst: Diffusion Model for Instance Segmentation	✓ Link	37.1						DiffusionInst-ResNet50	2022-12-06
VirTex: Learning Visual Representations from Textual Annotations	✓ Link	36.9	58.4	39.7				VirTex Mask R-CNN (ResNet-50-FPN)	2020-06-11
RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation	✓ Link	36.4%	57.9%	39.0%	16.4%	39.5%	51.6%	RDSNet (data aug)	2019-12-11
MogaNet: Multi-order Gated Aggregation Network	✓ Link	35.8						MogaNet-T	2022-11-07
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	✓ Link	34.9						A2MIM (ResNet-50 2x)	2022-05-27
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation	✓ Link	33.8	52.9	35.9				E2EC DLA-34	2022-03-08
Fully Convolutional Instance-aware Semantic Segmentation	✓ Link	33.6%	54.5%					FCIS+++ +OHEM	2016-11-23
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	33.6						Mask R-CNN (Bottleneck-injected ResNet-50, FPN)	2020-11-25
PolarMask: Single Shot Instance Segmentation with Polar Representation	✓ Link	32.9%	55.4%	33.8%	15.5%	35.1%	46.3%	PolarMask (ResNeXt-101-FPN)	2019-09-29
PolarMask: Single Shot Instance Segmentation with Polar Representation	✓ Link	30.4%	51.9%	31%	13.4%	32.4%	42.8%	PolarMask (ResNet-101-FPN)	2019-09-29
YOLACT: Real-time Instance Segmentation	✓ Link	29.8%						YOLACT (ResNet-50-FPN)	2019-04-04
Fully Convolutional Instance-aware Semantic Segmentation	✓ Link	29.2%	49.5%		7.1%	31.3%	50.0%	FCIS +OHEM	2016-11-23
A MultiPath Network for Object Detection	✓ Link	25.0%						MultiPath Network	2016-04-07
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link		80.8	62.2	41.0	58.9	70.3	InternImage-H	2022-11-10
ResNeSt: Split-Attention Networks	✓ Link		70.2	51.5	30.0	49.6	60.6	ResNeSt-200 (multi-scale)	2020-04-19
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link		66.2	47.4	27.2			CenterMask + VoVNetV2-99 (multi-scale)	2019-11-15
CenterMask : Real-Time Anchor-Free Instance Segmentation	✓ Link		60.8		19.4	41.7		CenterMask + VoVNetV2-57 (single-scale)	2019-11-15
Instance-aware Semantic Segmentation via Multi-task Network Cascades	✓ Link		44.3%					MNC	2015-12-14
ISDA: Position-Aware Instance Segmentation with Deformable Attention	✓ Link						55.7	ISDA (ResNet-50)	2022-02-23

OpenCodePapers

instance-segmentation-on-coco