The Missing Point in Vision Transformers for Universal Image Segmentation | ✓ Link | 70.8 | | | 85.4 | 50.6 | | | ViT-P (OneFormer, InternImage-H) | 2025-05-26 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 70.1 | 74.1 | 64.6 | 84.6 | 48.7 | | | OneFormer (ConvNeXt-L, single-scale, 512x1024, Mapillary Vistas-pretrained) | 2022-11-10 |
Scaling Wide Residual Networks for Panoptic Segmentation | | 69.6 | | | 85.3 | 46.8 | | | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, multi-scale) | 2020-11-23 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 68.51 | | | 83.0 | 46.5 | | | OneFormer (ConvNeXt-L, single-scale) | 2022-11-10 |
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation | ✓ Link | 68.5 | | | 84.6 | 44.2 | | | Axial-DeepLab-XL (Mapillary Vistas, multi-scale) | 2020-03-17 |
Scaling Wide Residual Networks for Panoptic Segmentation | | 68.5 | | | 84.6 | 42.8 | | | Panoptic-DeepLab (SWideRNet [1, 1, 4.5], Mapillary Vistas, single-scale) | 2020-11-23 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 68.4 | | | 83.6 | 46.7 | | | OneFormer (ConvNeXt-XL, single-scale) | 2022-11-10 |
kMaX-DeepLab: k-means Mask Transformer | ✓ Link | 68.4 | | | 83.5 | 44.0 | | | kMaX-DeepLab (single-scale) | 2022-07-08 |
AutoFocusFormer: Image Segmentation off the Grid | ✓ Link | 67.7 | 71.5 | 62.5 | 83.0 | 46.2 | | | AFF-Base (single-scale, point-based Mask2Former) | 2023-04-24 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 67.6 | | | 83.1 | 45.6 | | | OneFormer (DiNAT-L, single-scale) | 2022-11-10 |
EfficientPS: Efficient Panoptic Segmentation | ✓ Link | 67.5 | 70.3 | 63.2 | 82.1 | 43.5 | | | EfficientPS | 2020-04-05 |
Dilated Neighborhood Attention Transformer | ✓ Link | 67.2 | | | 83.4 | 44.5 | | | DiNAT-L (Mask2Former) | 2022-09-29 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 67.2 | | | 83.0 | 45.6 | | | OneFormer (Swin-L, single-scale) | 2022-11-10 |
AutoFocusFormer: Image Segmentation off the Grid | ✓ Link | 66.9 | 70.8 | 61.5 | 82.2 | 44.2 | | | AFF-Small (single-scale, point-based Mask2Former) | 2023-04-24 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 66.6 | | | 82.9 | 43.6 | | | Mask2Former (Swin-L) | 2021-12-02 |
EfficientPS: Efficient Panoptic Segmentation | ✓ Link | 64.9 | 67.7 | 61.0 | 90.3 | 39.1 | | | EfficientPS (Cityscapes-fine) | 2020-04-05 |
CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation | ✓ Link | 64.6 | | | 81.4 | | | | CMT-DeepLab (MaX-S, single-scale, IN-1K) | 2022-06-17 |
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation | ✓ Link | 64.1 | | | 81.5 | 38.5 | | | Panoptic-DeepLab (X71) | 2019-11-22 |
Intra-Batch Supervision for Panoptic Segmentation on High-Resolution Images | ✓ Link | 62.4 | 67.3 | 54.7 | | | | | Mask2Former + Intra-Batch Supervision (ResNet-50) | 2023-04-17 |
Combinatorial Optimization for Panoptic Segmentation: A Fully Differentiable Approach | ✓ Link | 62.1 | 67.2 | 55.1 | 79.3 | 34.1 | | | COPS (ResNet-50) | 2021-06-06 |
AdaptIS: Adaptive Instance Selection Network | | 62.0 | 64.4 | 58.7 | 79.2 | 36.3 | | | AdaptIS (ResNeXt-101) | 2019-09-17 |
UPSNet: A Unified Panoptic Segmentation Network | ✓ Link | 61.8 | 64.8 | 57.6 | 79.2 | 39.0 | | | UPSNet (ResNet-101, multiscale) | 2019-01-12 |
Fully Convolutional Networks for Panoptic Segmentation | ✓ Link | 61.4 | | 54.8 | | | | | Panoptic FCN* (ResNet-FPN) | 2020-12-01 |
Panoptic Segmentation | ✓ Link | 61.2 | 66.4 | 54 | | 36.4 | | | MRCNN + PSPNet (ResNet-101) | 2018-01-03 |
AdaptIS: Adaptive Instance Selection Network | | 60.6 | 62.9 | 57.5 | 77.2 | 33.9 | | | AdaptIS (ResNet-101) | 2019-09-17 |
UPSNet: A Unified Panoptic Segmentation Network | ✓ Link | 60.5 | 63.0 | 57.0 | 77.8 | 37.8 | | | UPSNet (ResNet-101) | 2019-01-12 |
Learning to Fuse Things and Stuff | | 60.4 | 63.3 | 56.1 | 78 | 39 | | | TASCNet (ResNet-50, multi-scale) | 2018-12-04 |
UPSNet: A Unified Panoptic Segmentation Network | ✓ Link | 59.3 | 62.7 | 54.6 | 75.2 | 33.3 | | | UPSNet (ResNet-50) | 2019-01-12 |
Learning to Fuse Things and Stuff | | 59.2 | 61.5 | 56 | 77.8 | 37.6 | | | TASCNet (ResNet-50) | 2018-12-04 |
Attention-guided Unified Network for Panoptic Segmentation | | 59.0 | 62.1 | 54.8 | 75.6 | 34.4 | | | AUNet (ResNet-101-FPN) | 2018-12-10 |
AdaptIS: Adaptive Instance Selection Network | | 59.0 | 61.3 | 55.8 | 75.3 | 32.3 | | | AdaptIS (ResNet-50) | 2019-09-17 |
Panoptic Feature Pyramid Networks | ✓ Link | 58.1 | 62.5 | 52.0 | 75.7 | 33.0 | | | Panoptic FPN (ResNet-101) | 2019-01-08 |
DeeperLab: Single-Shot Image Parser | | 56.5 | | | | | | | DeeperLab (Xception-71) | 2019-02-13 |
Weakly- and Semi-Supervised Panoptic Segmentation | ✓ Link | 53.8 | 62.1 | 42.5 | 79.8 | 28.6 | | | Dynamically Instantiated Network (ResNet-101) | 2018-08-10 |
Fully Convolutional Networks for Panoptic Segmentation | ✓ Link | | 70.6 | 59.5 | | | | | Panoptic FCN* (Swin-L, Cityscapes-fine) | 2020-12-01 |
Fully Convolutional Networks for Panoptic Segmentation | ✓ Link | | 66.6 | | | | | | Panoptic FCN* (ResNet-50-FPN) | 2020-12-01 |
Mask R-CNN | ✓ Link | | | 54.0 | | | | | Mask R-CNN+COCO | 2017-03-20 |