Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding | ✓ Link | 81.7 | MDETR | 2021-04-26 |
Compositional Attention Networks for Machine Reasoning | ✓ Link | 81.5 | MAC | 2018-03-08 |
FiLM: Visual Reasoning with a General Conditioning Layer | ✓ Link | 75.9 | CNN+GRU+FiLM | 2017-09-22 |
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding | ✓ Link | 67.8 | NS-VQA (1K programs) | 2018-10-04 |
Inferring and Executing Programs for Visual Reasoning | ✓ Link | 66.6 | IEP-18K | 2017-05-10 |