Context-Aware Video Instance Segmentation | ✓ Link | 65.3 | 87.3 | 73.2 | 49.7 | 70.3 | CAVIS(VIT-L, Offline) | 2024-07-03 |
DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | ✓ Link | 64.5 | 86.1 | 72.2 | 49.6 | 70.7 | DVIS-DAQ(VIT-L, Offline) | 2024-03-29 |
DVIS++: Improved Decoupled Framework for Universal Video Segmentation | ✓ Link | 63.9 | 86.7 | 71.5 | 48.8 | 69.5 | DVIS++(VIT-L, Offline) | 2023-12-20 |
DVIS++: Improved Decoupled Framework for Universal Video Segmentation | ✓ Link | 62.3 | 82.7 | 70.2 | 49.5 | 68.0 | DVIS++(VIT-L, Online) | 2023-12-20 |
RefineVIS: Video Instance Segmentation with Temporal Attention Refinement | | 61.4 | 84.1 | 68.5 | 48.3 | 65.2 | RefineVIS (Swin-L, online) | 2023-06-07 |
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation | ✓ Link | 60.3 | 81.3 | 67.1 | 48.8 | 64.5 | GRAtt-VIS (Swin-L) | 2023-05-26 |
TarViS: A Unified Approach for Target-based Video Segmentation | ✓ Link | 60.2 | 81.4 | 67.6 | 47.6 | 64.8 | TarViS (Swin-L) | 2023-01-06 |
DVIS: Decoupled Video Instance Segmentation Framework | ✓ Link | 60.1 | 83.0 | 68.4 | 47.7 | 65.7 | DVIS(Swin-L) | 2023-06-06 |
A Generalized Framework for Video Instance Segmentation | ✓ Link | 60.1 | 80.9 | 66.5 | 49.1 | 64.7 | GenVIS (Swin-L) | 2022-11-16 |
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation | | 59.8 | 82.0 | 66.5 | 47.9 | 64.4 | NOVIS (Swin-L) | 2023-08-29 |
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation | ✓ Link | 58.4 | 79.4 | 64.3 | 47.5 | 63.6 | Tube-Link(Swin-L) | 2023-03-22 |
UniVS: Unified and Universal Video Segmentation with Prompts as Queries | ✓ Link | 57.9 | 79.4 | 63.3 | 46.2 | 63.1 | UniVS(Swin-L) | 2024-02-28 |
VITA: Video Instance Segmentation via Object Token Association | ✓ Link | 57.5 | 80.6 | 61.0 | 47.7 | 62.6 | VITA (Swin-L) | 2022-06-09 |
In Defense of Online Models for Video Instance Segmentation | ✓ Link | 56.1 | 80.8 | 63.5 | 45 | 60.1 | IDOL (Swin-L) | 2022-07-21 |
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos | ✓ Link | 55.5 | 80.7 | 61.7 | 45.4 | 60.6 | MDQE(Swin-L) | 2023-03-25 |
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training | ✓ Link | 55.3 | 76.6 | 62 | 45.9 | 60.8 | MinVIS (Swin-L) | 2022-08-03 |
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation | ✓ Link | 54.4 | 77.7 | 59.8 | 43.8 | 57.8 | DeVIS (Swin-L) | 2022-07-22 |
BoxVIS: Video Instance Segmentation with Box Annotations | ✓ Link | 53.9 | 76.4 | 59.6 | 44.8 | 61.0 | BoxVIS(Swin-L & Box-sup) | 2023-03-26 |
InstanceFormer: An Online Video Instance Segmentation Framework | ✓ Link | 51.0 | 73.7 | 56.9 | 42.8 | 56.0 | InstanceFormer (Swin-L) | 2022-08-22 |
TarViS: A Unified Approach for Target-based Video Segmentation | ✓ Link | 50.9 | 71.6 | 56.6 | 42.2 | 57.2 | TarViS (Swin-T) | 2023-01-06 |
GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation | ✓ Link | 48.9 | 69.2 | 53.1 | 41.8 | 56.0 | GRAtt-VIS (ResNet-50) | 2023-05-26 |
TarViS: A Unified Approach for Target-based Video Segmentation | ✓ Link | 48.3 | 69.6 | 53.2 | 40.5 | 55.9 | TarViS (ResNet-50) | 2023-01-06 |
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation | | 47.2 | 69.4 | 50.0 | 41.3 | 54.4 | NOVIS (ResNet-50) | 2023-08-29 |
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation | ✓ Link | 43.1 | 66.8 | 46.6 | 38.0 | 50.1 | DeVIS (ResNet-50) | 2022-07-22 |
InstanceFormer: An Online Video Instance Segmentation Framework | ✓ Link | 40.8 | 62.4 | 43.7 | 36.1 | 48.1 | InstanceFormer (ResNet-50) | 2022-08-22 |
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation | ✓ Link | 34.6 | 54.0 | 38.0 | 29.4 | 39.1 | STMask(R101-DCN-FPN) | 2021-04-06 |