Paper | Code | Pointing Game Accuracy | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding | ✓ Link | 62.76 | VG_BiLSTM_VGG | 2018-11-28 | |
Detector-Free Weakly Supervised Grounding by Separation | ✓ Link | 58.21 | GbS Ensemble MS-COCO | 2021-04-20 | |
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | ✓ Link | 28.91 | MCB | 2016-06-06 |