Paper | Code | Top 1 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning | 65.1 | OmniVec2 | 2024-01-01 | |
OmniVec: Learning robust representations with cross modal sharing | 63.5 | OmniVec(ViT) | 2023-11-07 | |
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions | ✓ Link | 61.2% | InternImage-H(CNN) | 2022-11-10 |
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers | ✓ Link | 60.3 | MixMIM-L(ViT-L) | 2022-05-26 |
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders | ✓ Link | 59.5% | ViC-MAE (ViT-L) | 2023-03-21 |
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems | ✓ Link | 59.15 | µ2Net+ (ViT-L/16) | 2022-09-15 |
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers | ✓ Link | 58.9 | MixMIM-B (ViT) | 2022-05-26 |