Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers | ✓ Link | 80.1 | 2.6 | MCTF ($r=16$) | 2024-03-15 |
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | ✓ Link | 80.1 | 3.0 | dTPS | 2023-04-21 |
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers | ✓ Link | 79.9 | 2.4 | MCTF ($r=18$) | 2024-03-15 |
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers | ✓ Link | 79.8 | 2.9 | DiffRate | 2023-05-29 |
PPT: Token Pruning and Pooling for Efficient Vision Transformers | ✓ Link | 79.8 | 2.9 | PPT | 2023-10-03 |
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification | ✓ Link | 79.8 | 3.4 | DynamicViT (80%) | 2021-06-03 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations | ✓ Link | 79.8 | 3.5 | EViT (80%) | 2022-02-16 |
Learned Thresholds Token Merging and Pruning for Vision Transformers | ✓ Link | 79.8 | 3.8 | LTMP (80%) | 2023-07-20 |
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning | ✓ Link | 79.8 | 3.9 | SPViT (3.9G) | 2021-12-27 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations | ✓ Link | 79.8 | 4.0 | EViT (90%) | 2022-02-16 |
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification | ✓ Link | 79.8 | 4.0 | DynamicViT (90%) | 2021-06-03 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 79.8 | 4.6 | Base (DeiT-S) | 2020-12-23 |
Adaptive Token Sampling For Efficient Vision Transformers | ✓ Link | 79.7 | 2.9 | ATS | 2021-11-30 |
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | ✓ Link | 79.7 | 3.0 | eTPS | 2023-04-21 |
Token Merging: Your ViT But Faster | ✓ Link | 79.7 | 3.4 | ToMe ($r=8$) | 2022-10-17 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 79.6 | 3.0 | BAT (70%) | 2022-11-21 |
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention | ✓ Link | 79.6 | 3.0 | AS-DeiT-S (65%) | 2022-09-28 |
Learned Thresholds Token Merging and Pruning for Vision Transformers | ✓ Link | 79.6 | 3.0 | LTMP (60%) | 2023-07-20 |
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers | ✓ Link | 79.5 | 2.2 | MCTF ($r=20$) | 2024-03-15 |
Patch Slimming for Efficient Vision Transformers | | 79.5 | 2.4 | DPS-ViT | 2021-06-05 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations | ✓ Link | 79.5 | 3.0 | EViT (70%) | 2022-02-16 |
Patch Slimming for Efficient Vision Transformers | | 79.4 | 2.6 | PS-ViT | 2021-06-05 |
Token Merging: Your ViT But Faster | ✓ Link | 79.4 | 2.7 | ToMe ($r=13$) | 2022-10-17 |
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer | ✓ Link | 79.4 | 3.0 | EvoViT | 2021-08-03 |
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning | ✓ Link | 79.3 | 2.6 | SPViT (2.6G) | 2021-12-27 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 79.3 | 2.6 | BAT (60%) | 2022-11-21 |
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification | ✓ Link | 79.3 | 2.9 | DynamicViT (70%) | 2021-06-03 |
Chasing Sparsity in Vision Transformers: An End-to-End Exploration | ✓ Link | 79.2 | 3.2 | S$^2$ViTE | 2021-06-08 |
Token Merging: Your ViT But Faster | ✓ Link | 79.1 | 2.3 | ToMe ($r=16$) | 2022-10-17 |
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers | | 79.1 | 3.2 | IA-RED$^2$ | 2021-06-23 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 79.0 | 2.3 | BAT (50%) | 2022-11-21 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations | ✓ Link | 78.9 | 2.6 | EViT (60%) | 2022-02-16 |
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention | ✓ Link | 78.7 | 2.3 | AS-DeiT-S (50%) | 2022-09-28 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 78.6 | 2.0 | BAT (40%) | 2022-11-21 |
Learned Thresholds Token Merging and Pruning for Vision Transformers | ✓ Link | 78.6 | 2.3 | LTMP (45%) | 2023-07-20 |
AdaViT: Adaptive Tokens for Efficient Vision Transformer | ✓ Link | 78.6 | 3.6 | A-ViT | 2021-12-14 |
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations | ✓ Link | 78.5 | 2.3 | EViT (50%) | 2022-02-16 |
Scalable Vision Transformers with Hierarchical Pooling | ✓ Link | 78.3 | 2.7 | HVT-S-1 | 2021-03-19 |
Pruning Self-attentions into Convolutional Layers in Single Path | ✓ Link | 78.3 | 3.3 | SPViT | 2021-11-23 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 77.8 | 1.8 | BAT (30%) | 2022-11-21 |
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers | ✓ Link | 76.4 | 1.6 | BAT (20%) | 2022-11-21 |