OpenCodePapers

panoptic-segmentation-on-coco-minival

Panoptic Segmentation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodePQPQstPQthRQSQRQstRQthSQstSQthAPmIoUboxAPmaskAPModelNameReleaseDate
HyperSeg: Towards Universal Visual Segmentation with Large Language Model✓ Link61.2HyperSeg (Swin-B)2024-11-26
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link60.049.267.152.068.8OneFormer (InternImage-H,single-scale)2022-11-10
A Simple Framework for Open-Vocabulary Segmentation and Detection✓ Link59.553.2OpenSeeD (SwinL, single-scale)2023-03-14
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding✓ Link59.550.769.7UMG-CLIP-E/142024-01-12
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation✓ Link59.450.9MasK DINO (SwinL,single-scale)2022-06-06
Your ViT is Secretly an Image Segmentation Model✓ Link59.2EoMT (DINOv2-g, single-scale, 1280x1280)2025-03-24
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding✓ Link58.949.768.9UMG-CLIP-L/142024-01-12
Dilated Neighborhood Attention Transformer✓ Link58.548.864.949.268.3DiNAT-L (single-scale, Mask2Former)2022-09-29
Vision Transformer Adapter for Dense Predictions✓ Link58.448.465.048.9ViT-Adapter-L (single-scale, BEiTv2 pretrain, Mask2Former)2022-05-17
Visual Attention Network✓ Link58.248.264.8Visual Attention Network (VAN-B6 + Mask2Former)2022-02-20
kMaX-DeepLab: k-means Mask Transformer✓ Link58.148.864.3kMaX-DeepLab (single-scale, pseudo-labels)2022-07-08
Hierarchical Open-vocabulary Universal Image Segmentation✓ Link58.166.8HIPIE (ViT-H, single-scale)2023-07-03
kMaX-DeepLab: k-means Mask Transformer✓ Link58.048.664.2kMaX-DeepLab (single-scale, drop query with 256 queries)2022-07-08
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link58.048.464.349.268.1OneFormer (DiNAT-L, single-scale)2022-11-10
kMaX-DeepLab: k-means Mask Transformer✓ Link57.948.664.0kMaX-DeepLab (single-scale)2022-07-08
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link57.948.064.449.067.4OneFormer (Swin-L, single-scale)2022-11-10
Focal Modulation Networks✓ Link57.948.4FocalNet-L (Mask2Former (200 queries))2022-03-22
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link57.848.164.248.6Mask2Former (single-scale)2021-12-02
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers✓ Link55.846.961.7Panoptic SegFormer (single-scale)2021-09-08
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation✓ Link55.346.661.0CMT-DeepLab (single-scale)2022-06-17
Per-Pixel Classification is Not All You Need for Semantic Segmentation✓ Link52.744.058.563.581.8MaskFormer (single-scale)2021-07-13
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers✓ Link51.142.257.0MaX-DeepLab-L (single-scale)2020-12-01
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers✓ Link50.643.255.5Panoptic SegFormer (ResNet-101)2021-09-08
ResNeSt: Split-Attention Networks✓ Link47.937.055.1PanopticFPN+ResNeSt(single-scale)2020-04-19
End-to-End Object Detection with Transformers✓ Link45.13750.555.579.94661.778.580.933DETR-R101 (ResNet-101)2020-05-26
Fully Convolutional Networks for Panoptic Segmentation✓ Link44.335.6505380.743.559.376.783.4Panoptic FCN* (ResNet-50-FPN)2020-12-01
End-to-End Object Detection with Transformers✓ Link44.133.651.053.379.542.160.674.083.239.7PanopticFPN++2020-05-26
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation✓ Link43.9Axial-DeepLab-L (multi-scale)2020-03-17
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation✓ Link43.435.648.5Axial-DeepLab-L (single-scale)2020-03-17
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation✓ Link36.848.6Axial-DeepLab-L(multi-scale)2020-03-17
Fully Convolutional Networks for Panoptic Segmentation✓ Link58.5 61.683.251.168.6 81.184.6Panoptic FCN* (Swin-L, single-scale)2020-12-01