Paper | Code | Precision@(F1=1, IoU≥0.5) | N-acc. | ModelName | ReleaseDate |
---|---|---|---|---|---|
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion | ✓ Link | 62.1 | 54.7 | SimVG-DB | 2024-09-26 |
Universal Instance Perception as Object Discovery and Retrieval | ✓ Link | 58.2 | 50.6 | UNINEXT | 2023-03-12 |
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | ✓ Link | 41.5 | 36.1 | MDETR | 2021-04-26 |
Vision-Language Transformer and Query Generation for Referring Segmentation | ✓ Link | 36.6 | 35.2 | VLT | 2021-08-12 |
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation | ✓ Link | 28.0 | 30.6 | MCN | 2020-03-19 |