OpenCodePapers

referring-expression-segmentation-on-a2d

Referring Expression Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAPIoU overallIoU meanPrecision@0.5Precision@0.6Precision@0.7Precision@0.8Precision@0.9ModelNameReleaseDate
Spectrum-guided Multi-granularity Referring Video Object Segmentation✓ Link0.5850.7990.7200.8430.8220.7670.6170.259SgMg (Video-Swin-B)2023-07-25
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation✓ Link0.5730.8070.7250.8510.8270.7650.6070.252SOC (Video-Swin-B)2023-05-26
Language as Queries for Referring Video Object Segmentation✓ Link0.5500.7860.7030.8310.8040.7410.5790.212ReferFormer (Video-Swin-B)2022-01-03
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation✓ Link0.5040.7470.6690.790.7560.6870.5350.195SOC (Video-Swin-T)2023-05-26
Multi-Attention Network for Compressed Video Referring Object Segmentation✓ Link0.4710.7260.6320.7340.6820.5790.3890.132MANET2022-07-26
Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation0.4690.7140.5980.7020.6630.5850.4280.151VLIDE2022-03-30
Local-Global Context Aware Transformer for Language-Guided Video Segmentation✓ Link0.4650.690.5970.7090.640.5250.3510.101Locater2022-03-18
End-to-End Referring Video Object Segmentation with Multimodal Transformers✓ Link0.4610.720.640.7540.7120.6380.4850.169MTTR (w=10)2021-11-29
End-to-End Referring Video Object Segmentation with Multimodal Transformers✓ Link0.4470.7020.6180.7210.6840.6070.4560.164MTTR (w=8)2021-11-29
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation✓ Link0.4190.6730.5580.6450.5970.5230.3750.13mmmmtbvs2022-04-06
Cross-Modal Progressive Comprehension for Referring Segmentation✓ Link0.4040.6530.5730.6550.5920.5060.3420.098CMPC-V (I3D)2021-05-15
Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation0.3990.6620.5610.6540.5890.4970.3330.091Hui et al.2021-05-14
Actor and Action Modular Network for Text-based Video Segmentation0.3960.6170.5520.6810.6290.5230.2960.029AAMN2020-11-02
Polar Relative Positional Encoding for Video-Language Segmentation0.3880.6610.5290.6340.5790.4830.3220.083PRPE2020-07-20
Cross-Modal Progressive Comprehension for Referring Segmentation✓ Link0.3510.6490.5150.5900.5270.4340.2840.068CMPC-V (R2D)2021-05-15
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries0.3330.6230.5310.6070.5250.4050.2350.045CMDy2020-04-03
Visual-Textual Capsule Routing for Text-Based Video Segmentation0.3030.5680.4600.5260.4500.3450.2070.036VT-Capsule2020-06-01
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query✓ Link0.2740.6010.4900.5570.4590.3190.160.02ACGA2019-10-01
Actor and Action Video Segmentation from a Sentence✓ Link0.2150.5510.4260.50.3760.2310.0940.004Gavriluyk el al. (Optical flow)2018-03-20
Actor and Action Video Segmentation from a Sentence✓ Link0.1980.5360.4210.4750.3470.2110.080.002Gavriluyk el al.2018-03-20
Tracking by Natural Language Specification0.1630.5150.3540.3870.2900.1750.0660.001Li et al.2017-07-01
Segmentation from Natural Language Expressions✓ Link0.1320.4740.3500.3480.2360.1330.0330.000Hu et al.2016-03-20
Hierarchical interaction network for video object segmentation from referring expressions0.6790.5290.6110.5590.4860.3420.12HINet2021-11-22
Hierarchical interaction network for video object segmentation from referring expressions0.6720.4970.5780.5340.4560.3110.093RefVOS2021-11-22
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation0.6440.6550.7040.6770.6170.4890.171ClawCraneNet2021-03-19
Referring Segmentation in Images and Videos with Cross-Modal Self-Attention Network0.6180.4320.4870.4310.3580.2310.052CMSA+CFSA2021-02-09
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation✓ Link0.5990.5990.4950.064RefVOS2020-10-01