OpenCodePapers
referring-video-object-segmentation-on-refer
Video Object Segmentation
Referring Video Object Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
J&F
↕
J
↕
F
↕
ModelName
ReleaseDate
↕
Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
✓ Link
73.7
71.8
75.7
FindTrack
2025-03-05
General Object Foundation Model for Images and Videos at Scale
✓ Link
70.6
68.2
72.9
GLEE-Pro
2023-12-14
HyperSeg: Towards Universal Visual Segmentation with Large Language Model
✓ Link
68.5
HyperSeg
2024-11-26
General Object Foundation Model for Images and Videos at Scale
✓ Link
67.7
65.6
69.7
GLEE-Plus
2023-12-14
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
✓ Link
67.1
65.3
68.9
HTR
2024-03-28
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
✓ Link
66.0
64.1
67.9
SOC
2023-05-26
Spectrum-guided Multi-granularity Referring Video Object Segmentation
✓ Link
65.7
63.9
67.4
SgMg
2023-07-25
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
✓ Link
65.4
63.3
67.5
VATEX
2024-04-12
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
✓ Link
63.8
61.9
65.6
VLT
2022-10-28
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
63.4
61.5
65.3
HTML-SwinL
2023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
63.4
61.5
65.2
HTML-Video-SwinB
2023-01-01
Language as Queries for Referring Video Object Segmentation
✓ Link
62.9
61.3
64.6
ReferFormer (Large)
2022-01-03
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
61.4
59.9
62.9
HTML-Video-SwinS
2023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
61.2
59.5
63.0
HTML-Video-SwinT
2023-01-01
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
✓ Link
60.2
58.9
61.5
R2VOS (Swin-T)
2022-07-04
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
58.5
57.3
59.8
HTML-ResNet101
2023-01-01
HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
57.8
56.5
59.0
HTML-ResNet50
2023-01-01
Cross-Modal Self-Attention Network for Referring Image Segmentation
✓ Link
36.4
34.8
38.1
CMSA
2019-04-09