OpenCodePapers

image-captioning-on-nocaps-in-domain

Image Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeCIDErB1B2B3B4ROUGE-LMETEORSPICEModelNameReleaseDate
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link149.1PaLI2022-09-14
GIT: A Generative Image-to-text Transformer for Vision and Language✓ Link124.1888.8675.8659.9441.163.8233.8316.36GIT2, Single Model2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language✓ Link122.488.5576.160.5341.6564.0233.4116.18GIT, Single Model2022-05-27
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link121.0988.0275.2159.3841.1664.3934.2215.69PaLI2022-09-14
[]()117.987.2774.2958.0139.2463.1233.0115.49CoCa - Google Brain
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning112.8286.3372.8355.9437.9762.4832.715.22Microsoft Cognitive Services team2020-09-28
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision✓ Link108.9884.6470.052.9634.6661.0131.9714.6Single Model2021-08-24
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features✓ Link105.913.6GRIT (zero-shot, no VL pretraining, no CBS)2022-07-20
[]()104.984.269.5752.5634.860.5231.7715.04FudanFVL
[]()104.2582.9168.0250.7533.5959.6731.3314.85FudanWYZ
[]()102.6484.469.851.8932.8660.0730.4314.47IEDA-LAB
[]()101.6983.7768.751.2632.7659.7530.5114.99vll@mk514
[]()100.0384.0369.1251.1633.1559.6730.0614.08MD
[]()99.981.8667.250.534.1159.5431.6115.17firethehole
VinVL: Revisiting Visual Representations in Vision-Language Models✓ Link97.9983.2468.0449.6830.6258.5429.5113.63VinVL (Microsoft Cognitive Services + MSR)2021-01-02
[]()96.6382.968.0949.7331.2458.6229.3713.61ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
[]()88.0880.564.4846.4629.5956.8428.713.04camel XE
[]()87.8679.5863.0943.9226.0755.8827.9712.6evertyhing
[]()87.2880.6864.745.3327.0956.7627.712.79RCAL
[]()87.2180.2663.9444.6527.2356.427.712.28icgp2ssi1_coco_si_0.02_5_test
[]()85.8181.6463.7943.4325.1555.0627.2512.35cxy_nocaps_training
[]()85.8181.6463.7943.4325.1555.0627.2512.35作者给的test文件
ClipCap: CLIP Prefix for Image Captioning✓ Link84.8512.14ClipCap (Transformer)2021-11-18
[]()84.8380.763.2742.8625.7855.9127.2312.06Oscar
[]()84.7981.6163.7443.2224.8255.0327.2712.3Xinyi
[]()80.6176.8957.337.7821.4953.4728.5314.99Human
[]()80.1978.7361.6342.3525.9455.2527.2512.38MQ-UpDown-C
ClipCap: CLIP Prefix for Image Captioning✓ Link79.7312.2ClipCap (MLP + GPT2 tuning)2021-11-18
[]()76.0277.6559.5839.8622.8353.9826.3511.8UpDown + ELMo + CBS
[]()74.2777.6860.3441.524.5754.4226.0411.47UpDown
[]()74.2777.6860.3441.524.5754.4226.0411.46nocaps_training
[]()73.7375.3156.7937.8521.9152.4426.0212.047_10-7_40000_predict_test.json
[]()70.3374.3555.9736.1220.8452.2625.111.07None
[]()69.5976.4858.7639.2821.9653.2225.0810.94YX
[]()68.9877.0659.9740.5423.853.4925.0610.55B2
[]()67.9176.1257.9838.4421.9252.5325.0710.87area_attention
[]()64.3772.7653.5234.1319.4550.5323.4710.11coco_all_19
[]()62.9676.4956.233.7315.1450.8423.6810.13Neural Baby Talk + CBS
[]()60.8975.9156.7835.5817.3951.4223.89.81Neural Baby Talk
[]()58.9372.2451.8829.5714.5449.0522.048.91CS395T
[]()53.3472.0552.8931.9216.7149.6422.049.16Yu-Wu