Paper | Code | CIDEr | B1 | B2 | B3 | B4 | ROUGE-L | METEOR | SPICE | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|
GIT: A Generative Image-to-text Transformer for Vision and Language | ✓ Link | 125.51 | 88.9 | 75.86 | 58.9 | 38.95 | 63.66 | 32.95 | 16.11 | GIT2 | 2022-05-27 |
GIT: A Generative Image-to-text Transformer for Vision and Language | ✓ Link | 123.92 | 88.56 | 75.48 | 58.46 | 38.44 | 63.5 | 32.86 | 15.96 | GIT | 2022-05-27 |
[]() | 104.76 | 84.45 | 69.28 | 51.1 | 31.48 | 59.75 | 30.31 | 14.97 | VLAF2 | ||
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | 101.2 | 82.88 | 67.01 | 48.73 | 30.21 | 58.76 | 30.0 | 14.27 | Microsoft Cognitive Services team | 2020-09-28 | |
[]() | 85.81 | 79.88 | 61.31 | 40.26 | 21.84 | 53.98 | 27.0 | 13.01 | test_cbs2 | ||
[]() | 85.73 | 79.51 | 62.65 | 43.22 | 24.97 | 55.13 | 26.37 | 11.96 | icp2ssi1_coco_si_0.02_5_test | ||
[]() | 84.58 | 77.05 | 56.97 | 36.84 | 19.85 | 53.06 | 28.42 | 14.72 | Human | ||
[]() | 74.2 | 77.68 | 58.31 | 37.04 | 19.85 | 52.64 | 24.97 | 11.45 | UpDown + ELMo + CBS | ||
[]() | 61.98 | 74.77 | 53.67 | 30.66 | 13.85 | 49.45 | 22.55 | 9.83 | Neural Baby Talk + CBS | ||
[]() | 56.85 | 75.25 | 56.93 | 36.91 | 20.49 | 51.84 | 23.6 | 10.33 | UpDown | ||
[]() | 53.21 | 73.69 | 54.1 | 32.37 | 15.99 | 49.63 | 21.93 | 9.26 | Neural Baby Talk |