Paper | Code | Top-1 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | ✓ Link | 86.2 | BIKE | 2022-12-31 |
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | ✓ Link | 84.6 | Text4Vis | 2022-07-04 |
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition | ✓ Link | 73.8 | LoCATe-GAT | 2024-11-27 |
Cross-modal Representation Learning for Zero-shot Action Recognition | 32.5 | ResT | 2022-05-03 | |
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications | ✓ Link | 26.6 | E2E | 2020-03-03 |