referring-expression-segmentation-on-a2d

Referring Expression Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	AP	IoU overall	IoU mean	Precision@0.5	Precision@0.6	Precision@0.7	Precision@0.8	Precision@0.9	ModelName	ReleaseDate
Spectrum-guided Multi-granularity Referring Video Object Segmentation	✓ Link	0.585	0.799	0.720	0.843	0.822	0.767	0.617	0.259	SgMg (Video-Swin-B)	2023-07-25
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	✓ Link	0.573	0.807	0.725	0.851	0.827	0.765	0.607	0.252	SOC (Video-Swin-B)	2023-05-26
Language as Queries for Referring Video Object Segmentation	✓ Link	0.550	0.786	0.703	0.831	0.804	0.741	0.579	0.212	ReferFormer (Video-Swin-B)	2022-01-03
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation	✓ Link	0.504	0.747	0.669	0.79	0.756	0.687	0.535	0.195	SOC (Video-Swin-T)	2023-05-26
Multi-Attention Network for Compressed Video Referring Object Segmentation	✓ Link	0.471	0.726	0.632	0.734	0.682	0.579	0.389	0.132	MANET	2022-07-26
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation		0.469	0.714	0.598	0.702	0.663	0.585	0.428	0.151	VLIDE	2022-03-30
Local-Global Context Aware Transformer for Language-Guided Video Segmentation	✓ Link	0.465	0.69	0.597	0.709	0.64	0.525	0.351	0.101	Locater	2022-03-18
End-to-End Referring Video Object Segmentation with Multimodal Transformers	✓ Link	0.461	0.72	0.64	0.754	0.712	0.638	0.485	0.169	MTTR (w=10)	2021-11-29
End-to-End Referring Video Object Segmentation with Multimodal Transformers	✓ Link	0.447	0.702	0.618	0.721	0.684	0.607	0.456	0.164	MTTR (w=8)	2021-11-29
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation	✓ Link	0.419	0.673	0.558	0.645	0.597	0.523	0.375	0.13	mmmmtbvs	2022-04-06
Cross-Modal Progressive Comprehension for Referring Segmentation	✓ Link	0.404	0.653	0.573	0.655	0.592	0.506	0.342	0.098	CMPC-V (I3D)	2021-05-15
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation		0.399	0.662	0.561	0.654	0.589	0.497	0.333	0.091	Hui et al.	2021-05-14
Actor and Action Modular Network for Text-based Video Segmentation		0.396	0.617	0.552	0.681	0.629	0.523	0.296	0.029	AAMN	2020-11-02
Polar Relative Positional Encoding for Video-Language Segmentation		0.388	0.661	0.529	0.634	0.579	0.483	0.322	0.083	PRPE	2020-07-20
Cross-Modal Progressive Comprehension for Referring Segmentation	✓ Link	0.351	0.649	0.515	0.590	0.527	0.434	0.284	0.068	CMPC-V (R2D)	2021-05-15
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries		0.333	0.623	0.531	0.607	0.525	0.405	0.235	0.045	CMDy	2020-04-03
Visual-Textual Capsule Routing for Text-Based Video Segmentation		0.303	0.568	0.460	0.526	0.450	0.345	0.207	0.036	VT-Capsule	2020-06-01
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query	✓ Link	0.274	0.601	0.490	0.557	0.459	0.319	0.16	0.02	ACGA	2019-10-01
Actor and Action Video Segmentation from a Sentence	✓ Link	0.215	0.551	0.426	0.5	0.376	0.231	0.094	0.004	Gavriluyk el al. (Optical flow)	2018-03-20
Actor and Action Video Segmentation from a Sentence	✓ Link	0.198	0.536	0.421	0.475	0.347	0.211	0.08	0.002	Gavriluyk el al.	2018-03-20
Tracking by Natural Language Specification		0.163	0.515	0.354	0.387	0.290	0.175	0.066	0.001	Li et al.	2017-07-01
Segmentation from Natural Language Expressions	✓ Link	0.132	0.474	0.350	0.348	0.236	0.133	0.033	0.000	Hu et al.	2016-03-20
Hierarchical interaction network for video object segmentation from referring expressions			0.679	0.529	0.611	0.559	0.486	0.342	0.12	HINet	2021-11-22
Hierarchical interaction network for video object segmentation from referring expressions			0.672	0.497	0.578	0.534	0.456	0.311	0.093	RefVOS	2021-11-22
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation			0.644	0.655	0.704	0.677	0.617	0.489	0.171	ClawCraneNet	2021-03-19
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network			0.618	0.432	0.487	0.431	0.358	0.231	0.052	CMSA+CFSA	2021-02-09
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation	✓ Link		0.599	0.599	0.495				0.064	RefVOS	2020-10-01

OpenCodePapers

referring-expression-segmentation-on-a2d