Paper | Code | Average Accuracy | Macro Average Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|
ImplicitQA: Going beyond frames towards Implicit Video Reasoning | ✓ Link | 64.1 | 68.6 | GPT O3 | 2025-06-26 |
[]() | 54.3 | 58.6 | GPT 4.1 | ||
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution | ✓ Link | 44.9 | 46.0 | Qwen2 VL - 7B | 2024-09-18 |
LLaVA-OneVision: Easy Visual Task Transfer | ✓ Link | 43.4 | 46.4 | LLaVA-OneVision - 7B | 2024-08-06 |
Qwen2.5-VL Technical Report | ✓ Link | 42.8 | 46.1 | Qwen 2.5 VL - 7B | 2025-02-19 |
Video Instruction Tuning With Synthetic Data | 42.1 | 46.3 | LLaVA-Video - 7B | 2024-10-03 | |
[]() | 33.9 | 37.5 | LLaVA-Next-Video - 7B |