OpenCodePapers

referring-expression-segmentation-on-refcoco-4

Referring Expression Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeOverall IoUMean IoUmIoUModelNameReleaseDate
HyperSeg: Towards Universal Visual Segmentation with Large Language Model✓ Link83.5HyperSeg2024-11-26
Multi-label Cluster Discrimination for Visual Representation Learning✓ Link82.9MLCD-Seg-7B2024-07-24
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy✓ Link82.3483.74DeRIS-L2025-07-02
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model✓ Link80EVF-SAM2024-06-28
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation✓ Link78.6DETRIS2025-01-15
Universal Segmentation at Arbitrary Granularity with Language Instruction✓ Link78.29UniLSeg-1002023-12-04
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints✓ Link77.96C3VG2025-01-12
Universal Segmentation at Arbitrary Granularity with Language Instruction✓ Link77.02UniLSeg-202023-12-04
Universal Instance Perception as Object Discovery and Retrieval✓ Link76.42UNINEXT-H2023-03-12
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation✓ Link75.15MaskRIS (Swin-B, combined DB)2024-11-28
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation75.0GROUNDHOG2024-02-26
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation✓ Link74.5675.71PolyFormer-L2023-02-14
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation74.53SafaRi-B2024-07-02
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation✓ Link74.4676.73MaskRIS (Swin-B)2024-11-28
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation✓ Link72.8974.51PolyFormer-B2023-02-14
Mask Grounding for Referring Image Segmentation✓ Link71.32MagNet2023-12-19
GRES: Generalized Referring Expression Segmentation✓ Link71.02ReLA2023-06-01
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation✓ Link68.43VLT2022-10-28
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation✓ Link68.38LAVT2021-12-04
CRIS: CLIP-Driven Referring Image Segmentation✓ Link68.08CRIS2021-11-30
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation65.92MaIL2021-11-21
Vision-Language Transformer and Query Generation for Referring Segmentation✓ Link59.20VLT2021-08-12
Comprehensive Multi-Modal Interactions for Referring Image Segmentation✓ Link58.46SHNet2021-04-21
Referring Image Segmentation via Cross-Modal Progressive Comprehension✓ Link53.44CPMC2020-10-01
Bi-Directional Relationship Inferring Network for Referring Image Segmentation52.87BRINet2020-06-01
MAttNet: Modular Attention Network for Referring Expression Comprehension✓ Link52.39MattNet2018-01-24
See-Through-Text Grouping for Referring Image Segmentation52.33STEP (5-fold)2019-10-01
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation✓ Link49.73RefVOS with BERT + MLM Loss2020-10-01
Cross-Modal Self-Attention Network for Referring Image Segmentation✓ Link47.60CMSA2019-04-09
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding✓ Link74.41VATEX2024-04-12