OpenCodePapers

visual-question-answering-on-vizwiz-2020-vqa

Visual Question Answering (VQA)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeoverallyes/nonumberotherunanswerableModelNameReleaseDate
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link73.3PaLI2022-09-14
Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model61.64CLIP-Ensemble2022-06-10
Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model60.66CLIP-Single2022-06-10
[]()56.3378.8927.142.389.49HSSLab
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization✓ Link56.0Video-LaVIT2024-02-05
[]()55.9373.4526.8342.2988.95sudoku
[]()54.7680.5227.3740.9286.82Katya
[]()49.5859.7920.634.1488.26Modified Attention
[]()48.3960.6522.2234.2183.43shaunakh
[]()44.960.0818.1628.8884.13e50
[]()44.6263.818.9728.1284.32SKP
[]()44.0153.0117.3427.3485.86knight777
[]()41.9249.8618.726.1381.54pk
[]()34.9660.0823.0419.0571.45Tartans
[]()34.1325.3114.0917.5778.2VWTest1
[]()6.2579.852.711.217.13BERT-RG