Paper | Code | Accuracy | RMSE | ModelName | ReleaseDate |
---|---|---|---|---|---|
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts | 77.1 | SMoLA-PaLI-X Specialist | 2023-12-01 | ||
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models | 76.6 | PaLI-X-VPD | 2023-12-05 | ||
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts | 70.7 | SMoLA-PaLI-X Generalist (0 shot) | 2023-12-01 | ||
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond | ✓ Link | 56.8 | 1.43 | MoVie-ResNeXt | 2020-04-24 |
TallyQA: Answering Complex Counting Questions | ✓ Link | 56.2 | 1.43 | RCN | 2018-10-29 |
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond | ✓ Link | 54.1 | 1.52 | MoVie | 2020-04-24 |