OpenCodePapers

panoptic-segmentation-on-ade20k-val

Panoptic Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodePQmIoUAPModelNameReleaseDate
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link54.560.440.2OneFormer (InternImage-H, emb_dim=256, single-scale, 896x896)2022-11-10
The Missing Point in Vision Transformers for Universal Image Segmentation✓ Link54.0ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain)2025-05-26
A Simple Framework for Open-Vocabulary Segmentation and Detection✓ Link53.7OpenSeed(SwinL, single scale, 1280x1280)2023-03-14
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link53.458.9OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-Pretrain)2022-11-10
Your ViT is Secretly an Image Segmentation Model✓ Link52.8EoMT (DINOv2-g, single-scale, 1280x1280, COCO pre-trained)2025-03-24
Generalized Decoding for Pixel, Image, and Language✓ Link52.459.138.7X-Decoder (Davit-d5, Deform, single-scale, 1280x1280)2022-12-21
The Missing Point in Vision Transformers for Universal Image Segmentation✓ Link51.9ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280)2025-05-26
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link51.558.337.1OneFormer (DiNAT-L, single-scale, 1280x1280)2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link51.457.037.8OneFormer (Swin-L, single-scale, 1280x1280)2022-11-10
kMaX-DeepLab: k-means Mask Transformer✓ Link50.955.2-kMaX-DeepLab (ConvNeXt-L, single-scale, 1281x1281)2022-07-08
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link50.558.336.0OneFormer (DiNAT-L, single-scale, 640x640)2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link50.157.436.3OneFormer (ConvNeXt-XL, single-scale, 640x640)2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link50.056.636.2OneFormer (ConvNeXt-L, single-scale, 640x640)2022-11-10
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link49.857.035.9OneFormer (Swin-L, single-scale, 640x640)2022-11-10
Generalized Decoding for Pixel, Image, and Language✓ Link49.658.135.8X-Decoder (L)2022-12-21
Dilated Neighborhood Attention Transformer✓ Link49.456.335.0DiNAT-L (Mask2Former, 640x640)2022-09-29
kMaX-DeepLab: k-means Mask Transformer✓ Link48.754.8-kMaX-DeepLab (ConvNeXt-L, single-scale, 641x641)2022-07-08
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link48.154.534.2Mask2Former (Swin-L)2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link46.255.433.2Mask2Former (Swin-L + FAPN, 640x640)2021-12-02
kMaX-DeepLab: k-means Mask Transformer✓ Link42.345.3-kMaX-DeepLab (ResNet50, single-scale, 1281x1281)2022-07-08
kMaX-DeepLab: k-means Mask Transformer✓ Link41.545.0-kMaX-DeepLab (ResNet50, single-scale, 641x641)2022-07-08
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link39.7Mask2Former (ResNet-50, 640x640)2021-12-02
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link37.950Panoptic-DeepLab (SwideRNet)2021-12-02
Per-Pixel Classification is Not All You Need for Semantic Segmentation✓ Link35.7MaskFormer (R101 + 6 Enc)2021-07-13
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link46.126.5Mask2Former (ResNet-50, 640x640)2021-12-02