OpenCodePapers

referring-video-object-segmentation-on-mevis

Video Object SegmentationReferring Video Object Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	J&F	J	F	ModelName	ReleaseDate
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation	✓ Link	53.7	50.7	56.7	MPG-SAM 2	2025-01-23
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation	✓ Link	53.2	50.5	55.9	FindTrack	2025-03-05
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation	✓ Link	51.3	48.5	54.2	GLUS	2025-04-10
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation	✓ Link	50.9	48	53.7	VRS-HQ (Chat-UniVi-13B)	2025-01-15
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations		49.3	44.7	53.9	ReferDINO (Swin-B)	2025-01-24
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation	✓ Link	48.3	45.4	51.2	SAMWISE	2024-11-26
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation	✓ Link	47.6	44.1	51.1	DsHmp + MTCM	2025-01-09
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation	✓ Link	46.4	43	49.8	DsHmp	2024-04-04
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory	✓ Link	42.7	39.9	45.5	HTR	2024-03-28
MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions	✓ Link	37.2	34.2	40.2	LMPM	2023-08-16
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation	✓ Link	35.5	33.6	37.3	VLT+TC	2022-10-28
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	✓ Link	32			InternVideo2.5	2025-01-21
Language as Queries for Referring Video Object Segmentation	✓ Link	31.0	29.8	32.2	ReferFormer	2022-01-03
End-to-End Referring Video Object Segmentation with Multimodal Transformers	✓ Link	30.0	28.8	31.2	MTTR	2021-11-29
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation	✓ Link	29.3	27.8	30.8	LBDT	2022-06-08
URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark	✓ Link	27.8	25.7	29.9	URVOS