| Paper | Code | Pointing Game Accuracy | Accuracy | ModelName | ReleaseDate |
|---|---|---|---|---|---|
| Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding | ✓ Link | 62.76 | VG_BiLSTM_VGG | 2018-11-28 | |
| Detector-Free Weakly Supervised Grounding by Separation | ✓ Link | 58.21 | GbS Ensemble MS-COCO | 2021-04-20 | |
| Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding | ✓ Link | 28.91 | MCB | 2016-06-06 |