[]() | | 89.3 | 91.2 | 87.4 | 98.4 | 97.2 | 98.9 | 0.0 | human | |
[]() | | 76.04 | 84.46 | 68.6 | 91.47 | 83.75 | 96.42 | 3.68 | DREAM+Unicoder-VL (MSRA) | |
[]() | | 74.03 | 82.12 | 66.89 | 89.0 | 83.58 | 96.76 | 1.29 | TRRNet (Ensemble) | |
[]() | | 73.81 | 80.8 | 67.64 | 91.76 | 83.9 | 96.73 | 1.7 | MIL-nbgao | |
[]() | | 73.33 | 79.68 | 67.73 | 77.02 | 83.7 | 96.36 | 2.46 | Kakao Brain | |
[]() | | 72.14 | 81.16 | 64.19 | 90.96 | 84.81 | 96.77 | 2.39 | Coarse-to-Fine Reasoning, Single Model | |
[]() | | 70.23 | 77.5 | 63.82 | 86.94 | 83.77 | 96.65 | 1.49 | 270 | |
[]() | | 67.55 | 80.45 | 56.16 | 93.83 | 84.16 | 96.53 | 2.78 | NSM ensemble (updated) | |
[]() | | 64.92 | 82.63 | 49.29 | 94.37 | 84.91 | 96.64 | 5.11 | VinVL-DPT | |
VinVL+L: Enriching Visual Representation with Location Context in VQA | ✓ Link | 64.85 | 82.59 | 49.19 | 94.0 | 84.91 | 96.62 | 4.59 | VinVL+L | 2023-02-22 |
VinVL: Revisiting Visual Representations in Vision-Language Models | ✓ Link | 64.65 | 82.63 | 48.77 | 94.35 | 84.98 | 96.62 | 4.72 | Single Model | 2021-01-02 |
[]() | | 63.94 | 80.84 | 49.03 | 91.54 | 84.74 | 96.56 | 4.69 | Wayne | |
[]() | | 63.2 | 77.91 | 50.22 | 89.84 | 85.15 | 96.47 | 5.25 | Single | |
[]() | | 63.17 | 78.94 | 49.25 | 93.25 | 84.28 | 96.41 | 3.71 | NSM single (updated) | |
LXMERT: Learning Cross-Modality Encoder Representations from Transformers | ✓ Link | 62.71 | 79.79 | 47.64 | 93.1 | 85.21 | 96.36 | 6.42 | LXR955, Ensemble | 2019-08-20 |
[]() | | 62.45 | 80.91 | 46.15 | 93.95 | 84.15 | 96.33 | 5.36 | MDETR | |
[]() | | 62.44 | 80.28 | 46.69 | 94.36 | 84.91 | 96.46 | 5.33 | 1-gqa | |
[]() | | 61.49 | 78.4 | 46.56 | 88.68 | 84.85 | 96.33 | 5.7 | UCM | |
Bilinear Graph Networks for Visual Question Answering | | 61.22 | 78.69 | 45.81 | 90.31 | 85.43 | 96.36 | 6.77 | GRN | 2019-07-23 |
[]() | | 61.12 | 78.07 | 46.16 | 91.13 | 84.8 | 96.36 | 5.55 | lxmert-adv-txt | |
[]() | | 61.1 | 77.99 | 46.19 | 91.08 | 84.82 | 96.36 | 5.52 | lxmert-adv-txt | |
[]() | | 61.09 | 77.84 | 46.3 | 88.92 | 85.49 | 96.43 | 5.68 | MSM@MSRA | |
[]() | | 61.05 | 78.02 | 46.06 | 89.77 | 84.95 | 96.5 | 5.24 | mlmbert | |
[]() | | 60.98 | 77.32 | 46.55 | 90.77 | 84.93 | 96.38 | 5.36 | fisher | |
[]() | | 60.95 | 78.41 | 45.54 | 89.08 | 84.27 | 96.35 | 4.86 | ckpt 19 exp 90 | |
[]() | | 60.93 | 77.83 | 46.01 | 90.3 | 84.69 | 96.35 | 5.74 | 45 | |
[]() | | 60.89 | 78.07 | 45.73 | 93.02 | 84.05 | 96.0 | 5.31 | IQA (single) | |
[]() | | 60.87 | 79.12 | 44.76 | 92.61 | 85.63 | 96.35 | 8.56 | Ensemble10 | |
[]() | | 60.83 | 78.9 | 44.89 | 92.49 | 84.55 | 96.19 | 5.54 | Meta Module, Single | |
[]() | | 60.7 | 77.41 | 45.96 | 89.65 | 84.55 | 96.37 | 6.09 | xpj | |
[]() | | 60.67 | 78.02 | 45.36 | 89.81 | 84.84 | 96.31 | 6.41 | fbe20v3.json | |
[]() | | 60.59 | 78.44 | 44.83 | 92.66 | 85.38 | 96.57 | 7.28 | LININ | |
[]() | | 60.51 | 76.87 | 46.06 | 88.2 | 85.19 | 96.15 | 8.48 | prompt IMT-16 | |
[]() | | 60.42 | 77.12 | 45.68 | 89.69 | 84.56 | 96.35 | 6.03 | vv69 | |
[]() | | 60.37 | 77.09 | 45.61 | 89.77 | 84.56 | 96.22 | 6.43 | bert_v1 | |
LXMERT: Learning Cross-Modality Encoder Representations from Transformers | ✓ Link | 60.33 | 77.16 | 45.47 | 89.59 | 84.53 | 96.35 | 5.69 | LXR955, Single Model | 2019-08-20 |
[]() | | 60.28 | 77.13 | 45.41 | 89.47 | 84.45 | 96.33 | 5.38 | IIE_Morningstar | |
[]() | | 60.27 | 76.99 | 45.51 | 90.16 | 84.49 | 96.31 | 5.39 | full_nsp_ft_results_submit_predict.json | |
[]() | | 60.18 | 76.97 | 45.36 | 89.65 | 84.47 | 96.33 | 5.29 | TESTOVQA007 | |
[]() | | 60.18 | 76.84 | 45.48 | 89.77 | 84.6 | 96.37 | 5.65 | test gqa | |
[]() | | 60.17 | 77.19 | 45.14 | 89.61 | 84.46 | 96.36 | 5.83 | Future_Test_team | |
[]() | | 60.14 | 77.15 | 45.12 | 89.58 | 84.47 | 96.36 | 5.81 | tmp | |
[]() | | 60.07 | 76.84 | 45.27 | 89.32 | 84.55 | 96.35 | 6.21 | Inspur | |
[]() | | 60.02 | 76.37 | 45.59 | 90.05 | 84.34 | 96.29 | 5.63 | full_nsp_mlm_ft_joint_results_submit_predict.json | |
[]() | | 60.01 | 76.77 | 45.21 | 89.17 | 84.46 | 96.35 | 6.28 | SSRP | |
[]() | | 59.93 | 79.09 | 43.02 | 93.72 | 85.92 | 96.41 | 10.1 | Musan | |
[]() | | 59.84 | 76.79 | 44.89 | 89.52 | 84.72 | 96.2 | 6.06 | gaochongyang9 | |
[]() | | 59.81 | 78.02 | 43.75 | 91.43 | 84.77 | 96.5 | 6.0 | PVR | |
[]() | | 59.8 | 76.74 | 44.85 | 89.14 | 84.2 | 96.23 | 5.11 | BgTest | |
[]() | | 59.72 | 77.97 | 43.61 | 89.43 | 84.89 | 96.55 | 6.25 | DAM | |
[]() | | 59.54 | 77.98 | 43.26 | 89.21 | 84.94 | 96.24 | 6.01 | DL16 | |
[]() | | 59.43 | 77.11 | 43.82 | 89.05 | 84.94 | 96.56 | 6.39 | mcmi | |
[]() | | 59.37 | 77.53 | 43.35 | 88.63 | 84.71 | 96.18 | 6.06 | rishabh_test | |
[]() | | 59.29 | 77.31 | 43.38 | 88.94 | 84.43 | 96.3 | 5.8 | UNITER + MAC + Graph Networks | |
[]() | | 59.12 | 76.69 | 43.6 | 88.9 | 84.78 | 96.43 | 5.6 | LXMERT-S | |
[]() | | 59.06 | 76.07 | 44.04 | 89.81 | 82.76 | 93.82 | 6.14 | QGCRGN | |
[]() | | 58.91 | 76.08 | 43.75 | 89.52 | 84.52 | 96.18 | 6.93 | gbert1 | |
[]() | | 58.88 | 75.07 | 44.58 | 84.64 | 84.86 | 96.23 | 5.54 | glimple_all | |
[]() | | 58.72 | 76.4 | 43.11 | 89.58 | 84.68 | 96.21 | 6.58 | ours-4-gqa_el_tag_v4__pretrain_rel_tag_dist_tc_v7_checkpoint-47-157510-best-4.json | |
[]() | | 58.42 | 77.39 | 41.67 | 90.29 | 84.53 | 95.57 | 7.86 | Partial-MSP | |
[]() | | 58.2 | 75.91 | 42.57 | 88.25 | 84.72 | 96.08 | 5.81 | UCAS-SARI | |
[]() | | 58.12 | 76.39 | 42.0 | 88.01 | 84.8 | 96.06 | 5.65 | stu09e | |
[]() | | 58.06 | 76.6 | 41.7 | 90.96 | 85.27 | 96.31 | 7.6 | happyTeam | |
[]() | | 57.89 | 74.54 | 43.19 | 85.45 | 84.99 | 96.4 | 5.73 | graphRepresentation, Single | |
[]() | | 57.79 | 75.37 | 42.26 | 88.3 | 84.85 | 96.11 | 5.65 | VqaStar-UCAS-SARI | |
[]() | | 57.77 | 75.78 | 41.86 | 86.85 | 84.97 | 96.44 | 5.36 | REX | |
[]() | | 57.65 | 75.22 | 42.14 | 87.35 | 84.73 | 96.18 | 5.48 | MLVQA (single) | |
[]() | | 57.35 | 75.07 | 41.71 | 87.61 | 84.5 | 95.86 | 5.94 | rsa-14word | |
[]() | | 57.21 | 74.46 | 41.99 | 87.6 | 84.87 | 96.2 | 5.6 | result_run_2647872_epoch11 | |
[]() | | 57.14 | 75.07 | 41.31 | 87.36 | 84.49 | 95.87 | 5.29 | DeeTee | |
[]() | | 57.1 | 76.0 | 40.41 | 91.7 | 85.58 | 96.16 | 10.52 | BAN | |
[]() | | 57.07 | 73.77 | 42.33 | 84.68 | 84.81 | 96.48 | 4.7 | LCGN | |
[]() | | 57.01 | 74.78 | 41.32 | 87.74 | 84.25 | 96.03 | 6.06 | RSN (Single Model) | |
[]() | | 56.96 | 74.97 | 41.06 | 85.12 | 84.85 | 96.38 | 7.13 | GM6_9_2_train | |
[]() | | 56.95 | 75.01 | 41.02 | 90.49 | 85.46 | 96.37 | 9.5 | wcf-fight | |
[]() | | 56.95 | 74.62 | 41.36 | 87.71 | 84.57 | 95.98 | 5.81 | total14 | |
[]() | | 56.65 | 73.65 | 41.64 | 84.35 | 84.37 | 95.94 | 6.07 | Testify | |
[]() | | 56.59 | 73.0 | 42.11 | 84.7 | 84.86 | 96.4 | 4.68 | F205 | |
[]() | | 56.38 | 74.84 | 40.09 | 91.71 | 83.76 | 95.43 | 6.32 | Feb_ft2_mergeadd_weightalllstm_picklocw_box5_prep | |
[]() | | 56.28 | 73.73 | 40.87 | 86.86 | 84.2 | 96.01 | 5.78 | MMT-VQA | |
[]() | | 56.18 | 72.84 | 41.47 | 85.46 | 84.04 | 96.18 | 5.42 | IWantADonut | |
[]() | | 56.16 | 73.56 | 40.8 | 84.99 | 84.83 | 96.4 | 5.87 | GIN | |
[]() | | 56.11 | 72.65 | 41.52 | 85.51 | 84.36 | 96.25 | 5.42 | LOGNet+VLR | |
[]() | | 56.09 | 73.4 | 40.82 | 85.11 | 84.79 | 96.37 | 5.14 | Improved SNMN | |
[]() | | 56.0 | 73.9 | 40.2 | 87.16 | 84.45 | 96.01 | 6.02 | ST_VQA | |
[]() | | 55.93 | 71.81 | 41.93 | 83.2 | 85.09 | 96.01 | 6.05 | RD | |
[]() | | 55.7 | 72.88 | 40.53 | 83.52 | 84.81 | 96.39 | 5.32 | Deepblue_Semantics | |
[]() | | 55.65 | 72.86 | 40.46 | 89.18 | 85.27 | 96.33 | 9.69 | LW | |
[]() | | 55.57 | 72.39 | 40.74 | 83.32 | 84.24 | 96.15 | 10.18 | RSN (Single Model)_v6 | |
[]() | | 55.41 | 72.87 | 39.99 | 83.06 | 84.74 | 96.35 | 5.48 | nogg | |
[]() | | 55.35 | 72.65 | 40.08 | 84.17 | 84.56 | 96.32 | 5.22 | abc_test | |
[]() | | 55.0 | 72.09 | 39.92 | 83.47 | 84.66 | 96.34 | 5.29 | KU | |
[]() | | 54.94 | 71.7 | 40.14 | 82.71 | 84.78 | 96.4 | 5.1 | Eden_test | |
[]() | | 54.79 | 72.42 | 39.23 | 86.1 | 84.55 | 95.92 | 6.01 | HDU_ZWF | |
[]() | | 54.15 | 69.3 | 40.79 | 82.36 | 85.15 | 95.99 | 5.41 | vips | |
[]() | | 54.06 | 71.23 | 38.91 | 81.59 | 84.48 | 96.16 | 5.34 | MAC | |
[]() | | 53.89 | 72.52 | 37.44 | 87.47 | 85.05 | 96.39 | 8.66 | 5TMT-qe+o | |
[]() | | 53.85 | 68.44 | 40.97 | 80.2 | 85.19 | 96.28 | 5.84 | ZhaoLab | |
[]() | | 53.57 | 70.15 | 38.94 | 81.14 | 84.67 | 96.36 | 5.32 | test | |
[]() | | 53.31 | 70.41 | 38.23 | 80.33 | 84.32 | 95.99 | 6.4 | Sorbonne | |
[]() | | 52.3 | 68.46 | 38.04 | 84.36 | 85.2 | 96.2 | 12.54 | UJCNN | |
[]() | | 52.19 | 69.15 | 37.22 | 78.34 | 83.44 | 95.45 | 5.69 | MJ | |
[]() | | 52.02 | 67.35 | 38.5 | 80.44 | 83.94 | 95.75 | 5.64 | mac_qin | |
[]() | | 51.87 | 67.99 | 37.64 | 80.2 | 84.35 | 96.25 | 6.77 | Mithrandir | |
[]() | | 51.51 | 67.82 | 37.11 | 79.7 | 83.69 | 95.82 | 6.16 | happy | |
[]() | | 51.22 | 69.36 | 35.2 | 82.44 | 83.82 | 96.12 | 6.45 | Space Cat | |
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering | ✓ Link | 49.74 | 66.64 | 34.83 | 78.71 | 84.57 | 96.18 | 5.98 | BottomUp | 2017-07-25 |
[]() | | 49.28 | 67.59 | 33.12 | 83.68 | 83.41 | 94.95 | 14.28 | RAM_BUGGY | |
[]() | | 49.27 | 66.57 | 34.0 | 78.51 | 84.58 | 95.78 | 6.91 | sparsemax15 | |
[]() | | 48.97 | 63.85 | 35.83 | 83.85 | 83.93 | 95.62 | 13.72 | mfb+bert | |
[]() | | 48.44 | 65.02 | 33.81 | 81.19 | 85.29 | 96.15 | 17.79 | RES | |
[]() | | 47.72 | 66.28 | 31.34 | 84.16 | 84.52 | 95.45 | 19.05 | LAS | |
[]() | | 47.38 | 58.76 | 37.34 | 73.71 | 81.75 | 94.55 | 6.29 | test | |
[]() | | 46.55 | 63.26 | 31.8 | 74.57 | 84.25 | 96.02 | 7.46 | LSTM+CNN | |
[]() | | 45.86 | 64.74 | 29.2 | 70.57 | 86.13 | 96.61 | 8.38 | 113 | |
[]() | | 44.06 | 57.57 | 32.13 | 38.18 | 75.19 | 85.94 | 8.35 | Ediburgh-Mila-UCLA | |
[]() | | 43.84 | 59.24 | 30.24 | 67.71 | 84.01 | 95.32 | 10.99 | bear | |
[]() | | 42.75 | 61.21 | 26.45 | 63.51 | 84.2 | 95.99 | 7.63 | CHAIR | |
[]() | | 41.63 | 55.12 | 29.73 | 82.21 | 77.4 | 92.27 | 13.01 | MReaL | |
[]() | | 41.07 | 61.9 | 22.69 | 68.68 | 87.3 | 96.39 | 17.93 | LSTM | |
[]() | | 40.3 | 61.18 | 21.88 | 74.11 | 86.13 | 96.14 | 40.44 | Academia Sinica | |
[]() | | 37.03 | 56.61 | 19.74 | 63.96 | 85.12 | 95.76 | 28.4 | Fj | |
[]() | | 36.75 | 55.24 | 20.44 | 69.93 | 84.13 | 95.1 | 40.84 | Mycsulb | |
[]() | | 31.24 | 47.9 | 16.66 | 54.04 | 84.31 | 84.33 | 13.98 | LocalPrior | |
[]() | | 28.9 | 42.94 | 16.62 | 51.69 | 74.81 | 88.86 | 93.08 | GlobalPrior | |
[]() | | 26.45 | 45.69 | 9.47 | 55.23 | 50.93 | 60.81 | 11.49 | muc_ai | |
[]() | | 17.82 | 36.05 | 1.74 | 62.4 | 34.84 | 35.78 | 19.99 | CNN | |