Paper | Code | 1/4 | 1/2 | ModelName | ReleaseDate |
---|---|---|---|---|---|
[]() | 50.2 | CFMMC-Align | |||
Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer | ✓ Link | 46.0 | Tem-adapter | 2023-08-16 | |
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events | ✓ Link | 37.05 | 64.77 | Eclipse | 2021-03-29 |
Hierarchical Conditional Relation Networks for Video Question Answering | ✓ Link | 36.49 | 63.79 | HCRN | 2020-02-25 |
TVQA: Localized, Compositional Video Question Answering | ✓ Link | 35.16 | 63.15 | TVQA | 2018-09-05 |
Exploring Models and Data for Image Question Answering | ✓ Link | 29.91 | 54.25 | VIS+LST | 2015-05-08 |