OpenCodePapers

referring-video-object-segmentation-on-refer

Video Object SegmentationReferring Video Object Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeJ&FJFModelNameReleaseDate
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation✓ Link73.771.875.7FindTrack2025-03-05
General Object Foundation Model for Images and Videos at Scale✓ Link70.668.272.9GLEE-Pro2023-12-14
HyperSeg: Towards Universal Visual Segmentation with Large Language Model✓ Link68.5HyperSeg2024-11-26
General Object Foundation Model for Images and Videos at Scale✓ Link67.765.669.7GLEE-Plus2023-12-14
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory✓ Link67.165.368.9HTR2024-03-28
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation✓ Link66.064.167.9SOC2023-05-26
Spectrum-guided Multi-granularity Referring Video Object Segmentation✓ Link65.763.967.4SgMg2023-07-25
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding✓ Link65.463.367.5VATEX2024-04-12
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation✓ Link63.861.965.6VLT2022-10-28
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation63.461.565.3HTML-SwinL2023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation63.461.565.2HTML-Video-SwinB2023-01-01
Language as Queries for Referring Video Object Segmentation✓ Link62.961.364.6ReferFormer (Large)2022-01-03
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation61.459.962.9HTML-Video-SwinS2023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation61.259.563.0HTML-Video-SwinT2023-01-01
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus✓ Link60.258.961.5R2VOS (Swin-T)2022-07-04
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation58.557.359.8HTML-ResNet1012023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation57.856.559.0HTML-ResNet502023-01-01
Cross-Modal Self-Attention Network for Referring Image Segmentation✓ Link36.434.838.1CMSA2019-04-09