Paper | Code | overall | yes/no | number | other | unanswerable | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|
PaLI: A Jointly-Scaled Multilingual Language-Image Model | ✓ Link | 73.3 | PaLI | 2022-09-14 | ||||
Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model | 61.64 | CLIP-Ensemble | 2022-06-10 | |||||
Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model | 60.66 | CLIP-Single | 2022-06-10 | |||||
[]() | 56.33 | 78.89 | 27.1 | 42.3 | 89.49 | HSSLab | ||
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | ✓ Link | 56.0 | Video-LaVIT | 2024-02-05 | ||||
[]() | 55.93 | 73.45 | 26.83 | 42.29 | 88.95 | sudoku | ||
[]() | 54.76 | 80.52 | 27.37 | 40.92 | 86.82 | Katya | ||
[]() | 49.58 | 59.79 | 20.6 | 34.14 | 88.26 | Modified Attention | ||
[]() | 48.39 | 60.65 | 22.22 | 34.21 | 83.43 | shaunakh | ||
[]() | 44.9 | 60.08 | 18.16 | 28.88 | 84.13 | e50 | ||
[]() | 44.62 | 63.8 | 18.97 | 28.12 | 84.32 | SKP | ||
[]() | 44.01 | 53.01 | 17.34 | 27.34 | 85.86 | knight777 | ||
[]() | 41.92 | 49.86 | 18.7 | 26.13 | 81.54 | pk | ||
[]() | 34.96 | 60.08 | 23.04 | 19.05 | 71.45 | Tartans | ||
[]() | 34.13 | 25.31 | 14.09 | 17.57 | 78.2 | VWTest1 | ||
[]() | 6.25 | 79.85 | 2.71 | 1.21 | 7.13 | BERT-RG |