Paper | Code | R@1 | R@10 | R@5 | ModelName | ReleaseDate |
---|---|---|---|---|---|---|
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone | ✓ Link | 87.1 | 97.4 | 96.1 | Fiber-B | 2022-06-15 |
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models | ✓ Link | 84.1 | PEVL | 2022-05-23 | ||
VisualBERT: A Simple and Performant Baseline for Vision and Language | ✓ Link | 70.4 | 86.31 | 84.49 | VisualBERT | 2019-08-09 |