HyperSeg: Towards Universal Visual Segmentation with Large Language Model | ✓ Link | 77.2 | HyperSeg | 2024-11-26 |
The Missing Point in Vision Transformers for Universal Image Segmentation | ✓ Link | 69.1 | ViT-P (OneFormer, InternImage-H) | 2025-05-26 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 68.8 | OneFormer (InternImage-H, emb_dim=1024, single-scale) | 2022-11-10 |
The Missing Point in Vision Transformers for Universal Image Segmentation | ✓ Link | 68.8 | ViT-P (OneFormer, DiNAT-L) | 2025-05-26 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 68.1 | OneFormer (DiNAT-L, single-scale) | 2022-11-10 |
OneFormer: One Transformer to Rule Universal Image Segmentation | ✓ Link | 67.4 | OneFormer (Swin-L, single-scale) | 2022-11-10 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 67.4 | Mask2Former (Swin-L, single-scale) | 2021-12-02 |
Masked-attention Mask Transformer for Universal Image Segmentation | ✓ Link | 64.8 | MaskFormer (Swin-L, single-scale) | 2021-12-02 |
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | ✓ Link | 26.5 | SegCLIP | 2022-11-27 |