OpenCodePapers

referring-expression-segmentation-on-refcoco-5

Referring Expression Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeOverall IoUMean IoUmIoUModelNameReleaseDate
Multi-label Cluster Discrimination for Visual Representation Learning✓ Link75.6MLCD-Seg-7B2024-07-24
HyperSeg: Towards Universal Visual Segmentation with Large Language Model✓ Link75.2HyperSeg2024-11-26
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model✓ Link71.9EVF-SAM2024-06-28
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation✓ Link70.2DETRIS2025-01-15
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints✓ Link68.95C3VG2025-01-12
Universal Segmentation at Arbitrary Granularity with Language Instruction✓ Link68.15UniLSeg-1002023-12-04
Universal Segmentation at Arbitrary Granularity with Language Instruction✓ Link66.99UniLSeg-202023-12-04
Universal Instance Perception as Object Discovery and Retrieval✓ Link66.22UNINEXT-H2023-03-12
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation64.9GROUNDHOG2024-02-26
SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation64.88SafaRi-B2024-07-02
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation✓ Link62.83MaskRIS (Swin-B, combined DB)2024-11-28
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation✓ Link61.8766.73PolyFormer-L2023-02-14
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation✓ Link59.3964.5MaskRIS (Swin-B)2024-11-28
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation✓ Link59.3364.64PolyFormer-B2023-02-14
Mask Grounding for Referring Image Segmentation✓ Link58.14MagNet2023-12-19
GRES: Generalized Referring Expression Segmentation✓ Link57.65ReLA2023-06-01
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation✓ Link56.92VLT2022-10-28
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation56.06MaIL2021-11-21
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation✓ Link55.1LAVT2021-12-04
CRIS: CLIP-Driven Referring Image Segmentation✓ Link53.68CRIS2021-11-30
Vision-Language Transformer and Query Generation for Referring Segmentation✓ Link49.36VLT2021-08-12
Comprehensive Multi-Modal Interactions for Referring Image Segmentation✓ Link44.12SHNet2021-04-21
Referring Image Segmentation via Cross-Modal Progressive Comprehension✓ Link43.23CPMC2020-10-01
Bi-Directional Relationship Inferring Network for Referring Image Segmentation42.13BRINet2020-06-01
See-Through-Text Grouping for Referring Image Segmentation40.41STEP (5-fold)2019-10-01
MAttNet: Modular Attention Network for Referring Expression Comprehension✓ Link40.08MattNet2018-01-24
Cross-Modal Self-Attention Network for Referring Image Segmentation✓ Link37.89CMSA2019-04-09
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation✓ Link36.17RefVOS with BERT + MLM loss2020-10-01
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy✓ Link78.59DeRIS-L2025-07-02
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding✓ Link62.52VATEX2024-04-12