Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | | 64.7 | | | MOV (ViT-L/14) | 2022-07-15 |
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition | ✓ Link | 64 | | | OTI(ViT-L/14) | 2023-08-14 |
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models | ✓ Link | 61.4 | | | BIKE | 2022-12-31 |
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models | | 60.8 | | | MOV (ViT-B/16) | 2022-07-15 |
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception | | 59.1 | | | IMP-MoE-L | 2023-05-10 |
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | | 58.7 | 84.5 | | VideoCoCa | 2022-12-09 |
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition | ✓ Link | 58.4 | | | Text4Vis | 2022-07-04 |
Leveraging Temporal Contextualization for Video Action Recognition | ✓ Link | 56.0 | | | TC-CLIP | 2024-04-15 |
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | ✓ Link | 55.9 | | | OST | 2023-11-30 |
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge | ✓ Link | 52.3 | | | MAXI | 2023-03-15 |
VicTR: Video-conditioned Text Representations for Activity Recognition | | 51.0 | | | VicTR (ViT-B/16) | 2023-04-05 |
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition | ✓ Link | 50.7 | | | LoCATe-GAT | 2024-11-27 |
Expanding Language-Image Pretrained Models for General Video Recognition | ✓ Link | 44.6 | | | X-CLIP | 2022-08-04 |
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition | | 43.2 | | | CLASTER | 2021-01-18 |
Cross-modal Representation Learning for Zero-shot Action Recognition | | 41.1 | | | ResT | 2022-05-03 |
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification | ✓ Link | 39 | | | AURL | 2022-03-29 |
Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions | ✓ Link | 38.7 | | | JigsawNet | 2022-03-28 |
Synthetic Sample Selection for Generalized Zero-Shot Learning | | 35.9 | | | SPOT | 2023-04-06 |
Elaborative Rehearsal for Zero-shot Action Recognition | ✓ Link | 35.3 | | | ER-ZSAR | 2021-08-05 |
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications | ✓ Link | 32.7 | | | E2E | 2020-03-03 |
Towards Universal Representation for Unseen Action Recognition | | 24.4 | | | UR | 2018-03-22 |
I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs | ✓ Link | 23.2 | | | TS-GCN | 2019-07-17 |
Zero-Shot Action Recognition With Error-Correcting Output Codes | | 22.6 | | | ZSECOC | 2017-07-01 |
Alternative Semantic Representations for Zero-Shot Human Action Recognition | | 21.8 | | | ASR | 2017-06-28 |
Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation | | 19.7 | | | MTE | 2016-11-26 |
[]() | | 18.5 | | | ESZSL | |
Objects2action: Classifying and localizing actions without any video example | | 15.6 | | | O2A | 2015-10-23 |
Evaluation of Output Embeddings for Fine-Grained Image Classification | ✓ Link | 13.3 | | | SJE(word embedding) | 2014-09-30 |
Actor-agnostic Multi-label Action Recognition with Multi-modal Query | ✓ Link | | | 69.43 | MSQNet | 2023-07-20 |