referring-expression-segmentation-on-refer-1

Referring Expression Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	J&F	J	F	ModelName	ReleaseDate
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation	✓ Link	73.9	71.7	76.1	MPG-SAM 2	2025-01-23
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation	✓ Link	71	69	73.1	VRS-HQ (Chat-UniVi-13B)	2025-01-15
General Object Foundation Model for Images and Videos at Scale	✓ Link	70.6	68.2	72.9	GLEE-Pro	2023-12-14
Universal Instance Perception as Object Discovery and Retrieval	✓ Link	70.1	67.6	72.7	UNINEXT-H	2023-03-12
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations		69.3	67.0	71.5	ReferDINO (Swin-B)	2025-01-24
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation	✓ Link	68.4	66.4	70.4	MUTR	2023-05-25
Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation		67.6	65.3	69.8	VLP (VLMo-L)	2024-05-17
Segment Every Reference Object in Spatial and Temporal Spaces		67.4	65.5	69.2	UniRef-L (Swin-L)	2023-01-01
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	✓ Link	67.3±0.5	65.3	69.3	SOC (Joint training, Video-Swin-B)	2023-05-26
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory	✓ Link	67.1	65.3	68.9	HTR (Pre-training)	2024-03-28
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation	✓ Link	67.1	65	69.1	DsHmp (Video-Swin-Base)	2024-04-04
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces	✓ Link	66.9	64.8	69.0	UniRef++-L	2023-12-25
ViLLa: Video Reasoning Segmentation with Large Language Model	✓ Link	66.5	64.6	68.6	ViLLa	2024-07-18
Tracking Anything with Decoupled Video Segmentation	✓ Link	66.0			DEVA (ReferFormer)	2023-09-07
Spectrum-guided Multi-granularity Referring Video Object Segmentation	✓ Link	65.7	63.9	67.4	SgMg (Pre-training)	2023-07-25
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation		65.5	64.1	66.9	GroPrompt	2024-06-18
Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation		65	62.9	67.2	EPCFormer (ViT-H)	2023-08-08
Universal Segmentation at Arbitrary Granularity with Language Instruction	✓ Link	64.9	62.8	67.0	UniLSeg-100	2023-12-04
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation	✓ Link	64.2	62.5	66.0	LoSh-R	2023-06-14
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation	✓ Link	63.8	61.9	65.6	VLT	2022-10-28
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation	✓ Link	63.5	61.6	65.5	OnlineRefer (Swin-L, online)	2023-07-18
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus	✓ Link	61.3	59.6	63.1	R2VOS (Video-Swin-T)	2022-07-04
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	✓ Link	59.2	57.8	60.5	SOC (Video-Swin-T)	2023-05-26
UniVS: Unified and Universal Video Segmentation with Prompts as Queries	✓ Link	58.0	56.8	59.5	UniVS(Swin-L)	2024-02-28
Language as Queries for Referring Video Object Segmentation	✓ Link	57.3	56.1	58.4	ReferFormer (ResNet-101)	2022-01-03
Multi-Attention Network for Compressed Video Referring Object Segmentation	✓ Link	55.63	54.75	56.51	MANET	2022-07-26
Language as Queries for Referring Video Object Segmentation	✓ Link	55.6	54.8	56.6	ReferFormer (ResNet-50)	2022-01-03
End-to-End Referring Video Object Segmentation with Multimodal Transformers	✓ Link	55.32	54.00	56.64	MTTR (w=12)	2021-11-29
Local-Global Context Aware Transformer for Language-Guided Video Segmentation	✓ Link	50	48.8	51.1	Locater	2022-03-18
Multi-Level Representation Learning With Semantic Alignment for Referring Video Object Segmentation		49.70	50.96	48.43	MLRLSA	2022-01-01
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation		49.56	48.44	50.67	VLIDE	2022-03-30
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark	✓ Link	48.9	47.0	50.8	URVOS
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	✓ Link	34.2			InternVideo2.5	2025-01-21

OpenCodePapers

referring-expression-segmentation-on-refer-1