[]() | | 78.7 | 45.75 | 29.5 | 65.7 | 82.45 | 6.54 | Single | |
[]() | | 77.92 | 56.2 | 44.45 | 68.9 | 83.78 | 5.41 | P1P2+Distill+Ensemble | |
[]() | | 76.43 | 56.35 | 45.17 | 68.12 | 82.17 | 5.79 | Ensemble + Fine-tuning | |
[]() | | 76.17 | 56.42 | 44.32 | 70.23 | 84.52 | 5.47 | ensemble, finetune | |
[]() | | 76.14 | 56.05 | 44.75 | 68.4 | 82.75 | 5.72 | VD-PCR | |
[]() | | 75.35 | 51.17 | 38.9 | 62.82 | 77.98 | 6.69 | Ensemble | |
Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs | ✓ Link | 74.88 | 52.14 | 38.92 | 66.6 | 80.65 | 6.53 | Ensemble + Finetune | 2019-11-26 |
[]() | | 74.62 | 62.65 | 54.37 | 70.75 | 83.33 | 5.89 | bert-double-stream-finetuning | |
[]() | | 74.47 | 50.74 | 37.95 | 64.12 | 80.0 | 6.28 | CE-finetuned, single model | |
[]() | | 73.36 | 49.26 | 36.35 | 62.42 | 78.12 | 7.0 | 2 | |
[]() | | 73.15 | 43.07 | 27.82 | 60.38 | 76.55 | 7.42 | 2 | |
[]() | | 73.08 | 48.37 | 34.65 | 62.98 | 77.53 | 7.05 | 5_4 | |
[]() | | 73.07 | 56.03 | 44.2 | 68.45 | 81.62 | 5.98 | 7 | |
[]() | | 72.99 | 56.73 | 45.42 | 68.92 | 81.73 | 6.0 | 1 | |
[]() | | 72.85 | 49.03 | 35.88 | 62.88 | 77.75 | 7.07 | 5-2 | |
Ensemble of MRR and NDCG models for Visual Dialog | ✓ Link | 72.83 | 69.92 | 58.3 | 81.55 | 89.6 | 3.84 | 2 Step: Factor Graph Attention + VD-Bert | 2021-04-15 |
[]() | | 72.8 | 56.67 | 44.82 | 68.67 | 81.9 | 5.98 | 1 | |
[]() | | 72.58 | 49.47 | 35.77 | 64.15 | 78.25 | 6.9 | 10 | |
[]() | | 72.41 | 55.11 | 43.23 | 67.65 | 79.77 | 6.55 | Disc, Dense, 4 Ensemble. | |
[]() | | 72.35 | 57.19 | 45.3 | 70.15 | 82.38 | 6.04 | shanshandu | |
[]() | | 72.33 | 57.13 | 45.17 | 69.95 | 82.4 | 5.85 | 1 | |
[]() | | 72.33 | 47.54 | 33.5 | 63.28 | 77.33 | 7.14 | 20 | |
[]() | | 72.16 | 70.41 | 58.17 | 83.85 | 90.83 | 3.66 | Two-Step(refactor) | |
[]() | | 71.91 | 41.66 | 25.85 | 60.12 | 74.67 | 8.3 | simple_test | |
[]() | | 71.82 | 56.34 | 44.22 | 69.65 | 81.7 | 6.04 | 1 | |
[]() | | 70.08 | 39.61 | 25.65 | 53.62 | 70.12 | 9.01 | 5TS | |
[]() | | 68.08 | 63.92 | 50.78 | 79.53 | 89.6 | 4.28 | Bert(two-stream) | |
[]() | | 67.09 | 70.95 | 57.07 | 88.42 | 95.08 | 2.91 | Ensemble FGA + BERT | |
[]() | | 64.79 | 64.62 | 51.82 | 80.35 | 89.95 | 4.29 | CARE(Single Model) | |
[]() | | 64.48 | 58.57 | 44.27 | 76.15 | 86.42 | 5.13 | sdfsdaf | |
[]() | | 64.04 | 71.24 | 58.27 | 87.55 | 94.45 | 2.96 | MRR ensemble (Naive) | |
[]() | | 63.94 | 68.16 | 54.67 | 84.95 | 93.1 | 3.3 | CAF | |
[]() | | 63.87 | 67.5 | 53.85 | 84.67 | 93.25 | 3.32 | w/ VQA + CC, single model | |
[]() | | 63.87 | 67.5 | 53.85 | 84.67 | 93.25 | 3.32 | test1 | |
[]() | | 63.75 | 67.49 | 53.75 | 85.02 | 93.25 | 3.31 | sh101 | |
[]() | | 60.91 | 66.63 | 52.52 | 84.1 | 92.27 | 3.41 | SCL_48 | |
[]() | | 60.33 | 66.53 | 52.62 | 84.12 | 92.5 | 3.4 | Transformer+2cons | |
[]() | | 60.31 | 64.95 | 50.48 | 83.15 | 93.15 | 3.44 | single-model | |
[]() | | 60.19 | 64.25 | 50.88 | 80.92 | 90.6 | 4.11 | 1 | |
[]() | | 59.69 | 64.14 | 50.62 | 80.77 | 89.83 | 4.18 | 211 | |
Multi-View Attention Network for Visual Dialog | ✓ Link | 59.37 | 64.84 | 51.45 | 81.12 | 90.65 | 3.97 | MVAN | 2020-04-29 |
[]() | | 59.33 | 66.2 | 51.62 | 85.05 | 93.7 | 3.25 | single model | |
[]() | | 59.23 | 64.58 | 51.25 | 80.92 | 90.05 | 4.03 | gr | |
[]() | | 59.0 | 62.65 | 49.48 | 78.1 | 88.35 | 4.5 | lkh(single-model) | |
[]() | | 58.59 | 63.7 | 50.3 | 79.47 | 89.15 | 4.26 | lijunlin_7 | |
[]() | | 58.56 | 61.87 | 48.4 | 78.0 | 88.6 | 4.49 | wqedasd(single model) | |
[]() | | 58.51 | 65.7 | 51.73 | 82.97 | 91.97 | 3.68 | Bert2constraints | |
[]() | | 58.49 | 64.31 | 50.8 | 80.8 | 89.65 | 4.11 | zxcdd | |
[]() | | 58.25 | 64.79 | 51.32 | 81.0 | 90.38 | 3.98 | jiuyigedian | |
[]() | | 58.19 | 64.43 | 50.7 | 80.83 | 90.18 | 4.13 | disc | |
[]() | | 58.14 | 63.31 | 49.68 | 80.45 | 89.25 | 4.31 | lijunlin_9 | |
Learning to Reason: End-to-End Module Networks for Visual Question Answering | ✓ Link | 58.1 | 58.8 | 44.15 | 76.88 | 86.88 | 4.4 | NMN | 2017-04-18 |
[]() | | 57.82 | 64.3 | 50.58 | 81.25 | 90.03 | 4.07 | zuizhong | |
[]() | | 57.6 | 64.57 | 49.75 | 82.23 | 91.67 | 3.67 | clean_wac_4freeze | |
Dual Attention Networks for Visual Reference Resolution in Visual Dialog | ✓ Link | 57.59 | 63.2 | 49.63 | 79.75 | 89.35 | 4.3 | DAN | 2019-02-25 |
[]() | | 57.39 | 47.03 | 36.93 | 56.47 | 65.8 | 13.3 | mvan_len40_test | |
Image-Question-Answer Synergistic Network for Visual Dialog | | 57.32 | 62.20 | 47.90 | 80.43 | | 4.17 | Synergistic | 2019-02-26 |
Factor Graph Attention | ✓ Link | 57.20 | 69.3 | 55.65 | 86.73 | 94.05 | 3.14 | 5xFGA (F-RCNNx101) | 2019-04-11 |
Making History Matter: History-Advantage Sequence Training for Visual Dialog | | 57.17 | 64.22 | 50.88 | 80.63 | 89.45 | 4.20 | HACAN | 2019-02-25 |
Iterative Context-Aware Graph Inference for Visual Dialog | ✓ Link | 56.64 | 63.49 | 49.85 | 80.63 | 90.15 | 4.11 | CAG | 2020-04-05 |
[]() | | 56.38 | 62.68 | 48.6 | 80.1 | 89.48 | 4.22 | kbgn_disc_5 | |
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue | ✓ Link | 56.32 | 63.23 | 49.25 | 80.23 | 89.7 | 4.11 | DualVD | 2019-11-17 |
[]() | | 55.94 | 63.3 | 49.18 | 81.0 | 89.6 | 4.2 | ERIC666 | |
[]() | | 55.88 | 62.24 | 47.58 | 80.45 | 89.72 | 4.09 | eightepoch | |
Recursive Visual Attention in Visual Dialog | ✓ Link | 55.59 | 63.03 | 49.03 | 80.40 | 89.83 | 4.18 | RVA | 2018-12-06 |
[]() | | 55.21 | 62.56 | 47.45 | 81.55 | 92.0 | 3.82 | single-model | |
Visual Coreference Resolution in Visual Dialog using Neural Module Networks | ✓ Link | 54.70 | 61.50 | 47.55 | 78.10 | 88.80 | 4.40 | CorefNMN (ResNet-152) | 2018-09-06 |
[]() | | 53.2 | 59.96 | 46.35 | 76.78 | 86.48 | 5.12 | jkl | |
[]() | | 53.19 | 45.84 | 35.9 | 54.97 | 61.7 | 20.71 | trainval_ch_9 | |
Reasoning Visual Dialogs with Structural and Partial Observations | ✓ Link | 52.82 | 61.37 | 47.33 | 77.98 | 87.83 | 4.57 | GNN | 2019-04-11 |
[]() | | 52.57 | 61.09 | 46.83 | 78.22 | 87.42 | 4.65 | DLC-4 | |
[]() | | 51.87 | 55.69 | 42.7 | 70.17 | 79.72 | 7.87 | gat_disc_relto_4 | |
[]() | | 49.94 | 60.11 | 45.6 | 77.53 | 87.9 | 4.7 | adasd | |
[]() | | 47.51 | 53.19 | 41.4 | 65.85 | 74.15 | 11.96 | gat_disc_3 | |
Visual Dialog | ✓ Link | 47.5 | 55.5 | 40.98 | 72.30 | 83.30 | 5.92 | MN-QIH-D | 2016-11-26 |
[]() | | 46.75 | 53.3 | 36.83 | 73.45 | 83.1 | 5.91 | paratraining1epoch | |
Visual Dialog | ✓ Link | 45.5 | 54.2 | 39.93 | 70.45 | 81.50 | 6.41 | HRE-QIH-D | 2016-11-26 |
Visual Dialog | ✓ Link | 45.3 | 55.4 | 40.95 | 72.45 | 82.83 | 5.95 | MN-QIH-D | 2016-11-26 |
[]() | | 23.0 | 29.97 | 16.62 | 43.58 | 53.05 | 22.05 | czczx | |
[]() | | 11.84 | 7.25 | 3.02 | 7.22 | 12.22 | 49.61 | qqhe | |