OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 44.2 | 64.3 | 49.9 | 23.7 | OneFormer (InternImage-H, emb_dim=1024, single-scale, 896x896, COCO-Pretrained) | 2022-11-10 |
A Simple Framework for Open-Vocabulary Segmentation and Detection | ✓ Link | 42.6 | | | | OpenSeeD | 2023-03-14 |
The Missing Point in Vision Transformers for Universal Image Segmentation | ✓ Link | 40.7 | | | | ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280, COCO_pretrain) | 2025-05-26 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 40.2 | 59.7 | 44.4 | 19.2 | OneFormer (DiNAT-L, single-scale, 1280x1280, COCO-pretrain) | 2022-11-10 |
Generalized Decoding for Pixel, Image, and Language | ✓ Link | 38.7 | 59.6 | 43.3 | 18.9 | X-Decoder (Davit-d5, Deform, single-scale, 1280x1280) | 2022-12-21 |
The Missing Point in Vision Transformers for Universal Image Segmentation | ✓ Link | 37.8 | | | | ViT-P (OneFormer, DiNAT-L, single-scale, 1280x1280) | 2025-05-26 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 36.0 | | | | OneFormer (DiNAT-L, single-scale) | 2022-11-10 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 35.9 | | | | OneFormer (Swin-L, single-scale) | 2022-11-10 |
Generalized Decoding for Pixel, Image, and Language | ✓ Link | 35.8 | | | | X-Decoder (L) | 2022-12-21 |
Dilated Neighborhood Attention Transformer | ✓ Link | 35.4 | 55.5 | 39.0 | 16.3 | DiNAT-L (Mask2Former, single-scale) | 2022-09-29 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 34.9 | 54.7 | 40 | 16.3 | Mask2Former (Swin-L, single-scale) | 2021-12-02 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 33.4 | 54.6 | 37.6 | 14.6 | Mask2Former (Swin-L + FAPN) | 2021-12-02 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 26.4 | | | 10.4 | Mask2Former (ResNet50) | 2021-12-02 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | | 43.1 | 28.9 | | Mask2Former (ResNet-50) | 2021-12-02 |