Paper | Code | Top-1 Accuracy | Top-5 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|
Florence: A New Foundation Model for Computer Vision | ✓ Link | 86.5 | 97.3 | Florence | 2021-11-22 |
ActionCLIP: A New Paradigm for Video Action Recognition | ✓ Link | 83.8 | ActionCLIP (ViT-B/16) | 2021-09-17 | |
Could Giant Pretrained Image Models Extract Universal Representations? | 81.7 | Frozen Backbone, SwinV2-G-ext22K (Video-Swin) | 2022-11-03 |