Orthogonal Temporal Interpolation for Zero-Shot Video Recognition | ✓ Link | 92.8 | | OTI(ViT-L/14) | 2023-08-14 |
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception | | 91.5 | | IMP-MoE-L | 2023-05-10 |
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | | 87.1 | | MOV (ViT-L/14) | 2022-07-15 |
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | | 86.6 | 98.4 | VideoCoCa | 2022-12-09 |
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | ✓ Link | 86.6 | | BIKE | 2022-12-31 |
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | ✓ Link | 85.8 | | Text4Vis | 2022-07-04 |
Leveraging Temporal Contextualization for Video Action Recognition | ✓ Link | 85.4 | | TC-CLIP | 2024-04-15 |
EVA-CLIP: Improved Training Techniques for CLIP at Scale | ✓ Link | 83.1 | | EVA-CLIP-E/14+ | 2023-03-27 |
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | | 82.6 | | MOV (ViT-B/16) | 2022-07-15 |
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | ✓ Link | 79.7 | | OST | 2023-11-30 |
EZ-CLIP: Efficient Zeroshot Video Action Recognition | ✓ Link | 79.1 | | EZ-CLIP | 2023-12-13 |
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | ✓ Link | 78.2 | | MAXI | 2023-03-15 |
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition | ✓ Link | 76.0 | | LoCATe-GAT | 2024-11-27 |
VicTR: Video-conditioned Text Representations for Activity Recognition | | 72.4 | | VicTR (ViT-B/16) | 2023-04-05 |
Expanding Language-Image Pretrained Models for General Video Recognition | ✓ Link | 72.0 | | X-CLIP | 2022-08-04 |
Cross-modal Representation Learning for Zero-shot Action Recognition | | 58.7 | | ResT | 2022-05-03 |
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification | ✓ Link | 58 | | AURL | 2022-03-29 |
Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions | ✓ Link | 56.0 | | JigsawNet | 2022-03-28 |
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition | | 53.9 | | CLASTER | 2021-01-18 |
Elaborative Rehearsal for Zero-shot Action Recognition | ✓ Link | 51.8 | | ER-ZSAR | 2021-08-05 |
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications | ✓ Link | 48 | | E2E | 2020-03-03 |
Synthetic Sample Selection for Generalized Zero-Shot Learning | | 40.9 | | SPOT | 2023-04-06 |
I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs | ✓ Link | 34.2 | | TS-GCN | 2019-07-17 |
Objects2action: Classifying and localizing actions without any video example | | 30.3 | | O2A | 2015-10-23 |
Alternative Semantic Representations for Zero-Shot Human Action Recognition | | 24.4 | | ASR | 2017-06-28 |
Towards Universal Representation for Unseen Action Recognition | | 17.5 | | UR | 2018-03-22 |
[]() | | 16.7 | | IAP | |
[]() | | 15.9 | | DAP | |
Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation | | 15.8 | | MTE | 2016-11-26 |
Zero-Shot Action Recognition With Error-Correcting Output Codes | | 15.1 | | ZSECOC | 2017-07-01 |
An embarrassingly simple approach to zero-shot learning | ✓ Link | 15.0 | | ESZSL | 2015-07-06 |
[]() | | 14.9 | | HAA | |
Evaluation of Output Embeddings for Fine-Grained Image Classification | ✓ Link | 12.0 | | SJE(Attribute) | 2014-09-30 |
Semantic Embedding Space for Zero-Shot Action Recognition | | 10.9 | | SVE | 2015-02-05 |
Evaluation of Output Embeddings for Fine-Grained Image Classification | ✓ Link | 9.9 | | SJE(Word Embedding) | 2014-09-30 |