Context-Aware Video Instance Segmentation | ✓ Link | 68.9 | 89.3 | 76.2 | 58.3 | 73.6 | CAVIS(ViT-L, Online) | 2024-07-03 |
DVIS++: Improved Decoupled Framework for Universal Video Segmentation | ✓ Link | 67.7 | 88.8 | 75.3 | 57.9 | 73.7 | DVIS++(ViT-L, Online) | 2023-12-20 |
DVIS: Decoupled Video Instance Segmentation Framework | ✓ Link | 64.9 | 88.0 | 72.7 | 56.5 | 70.3 | DVIS | 2023-06-06 |
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation | ✓ Link | 64.6 | 86.6 | 71.3 | 55.9 | 69.1 | Tube-Link | 2023-03-22 |
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training | ✓ Link | 61.6 | 83.3 | 68.6 | 54.8 | 66.6 | MinVIS (Swin-L) | 2022-08-03 |
Mask2Former for Video Instance Segmentation | ✓ Link | 60.4 | 84.4 | 67.0 | | | Mask2Former (Swin-L) | 2021-12-20 |
UniVS: Unified and Universal Video Segmentation with Prompts as Queries | ✓ Link | 60.0 | 82.1 | 65.3 | 54.7 | 66.8 | UniVS(Swin-L) | 2024-02-28 |
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos | ✓ Link | 59.9 | 84.9 | 67.3 | 53.5 | 65.0 | MDQE(Swin-L) | 2023-03-25 |
SeqFormer: Sequential Transformer for Video Instance Segmentation | ✓ Link | 59.3 | 82.1 | 66.4 | 51.7 | 64.4 | SeqFormer (Swin-L) | 2021-12-15 |
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation | ✓ Link | 57.1 | 80.8 | 66.3 | 50.8 | 61.0 | DeVIS (Swin-L) | 2022-07-22 |
InstanceFormer: An Online Video Instance Segmentation Framework | ✓ Link | 56.3 | 78.0 | 64.2 | 50.9 | 61.6 | InstanceFormer(Swin-L) | 2022-08-22 |
1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation | | 54.3 | 76.6 | 65.6 | 47 | 57.9 | TCIS (Swin-S) | 2021-06-12 |
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation | ✓ Link | 54.1 | 79.0 | 59.6 | 49.7 | 59.9 | Video K-Net (Swin-Base) | 2022-04-10 |
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation | | 52.8 | 75.7 | 56.9 | 50.3 | 60.6 | NOVIS (ResNet-50) | 2023-08-29 |
In Defense of Online Models for Video Instance Segmentation | ✓ Link | 49.5 | 74 | 52.9 | 47.7 | 58.7 | IDOL (ResNet-50) | 2022-07-21 |
Mask2Former for Video Instance Segmentation | ✓ Link | 49.2 | 72.8 | 54.2 | | | Mask2Former (ResNet-101) | 2021-12-20 |
SeqFormer: Sequential Transformer for Video Instance Segmentation | ✓ Link | 49.0 | 71.1 | 55.7 | 46.8 | 56.9 | SeqFormer (ResNet-101) | 2021-12-15 |
MSN: Efficient Online Mask Selection Network for Video Instance Segmentation | ✓ Link | 48.8 | 69.4 | 54.9 | 40.1 | 55.0 | MSN | 2021-06-19 |
SeqFormer: Sequential Transformer for Video Instance Segmentation | ✓ Link | 47.4 | 69.8 | 51.8 | 45.5 | 54.8 | SeqFormer (ResNet-50) | 2021-12-15 |
Mask2Former for Video Instance Segmentation | ✓ Link | 46.4 | 68.0 | 50.0 | | | Mask2Former (ResNet-50) | 2021-12-20 |
InstanceFormer: An Online Video Instance Segmentation Framework | ✓ Link | 45.6 | 68.6 | 49.6 | 42.1 | 53.5 | InstanceFormer(ResNet-50) | 2022-08-22 |
SeqFormer: Sequential Transformer for Video Instance Segmentation | ✓ Link | 45.1 | 66.9 | 50.5 | 45.6 | 54.6 | SeqFormer (ResNet-50) | 2021-12-15 |
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation | ✓ Link | 44.4 | 66.7 | 48.6 | 42.4 | 51.6 | DeVIS (ResNet-50) | 2022-07-22 |
Video Instance Segmentation using Inter-Frame Communication Transformers | ✓ Link | 42.8 | 65.8 | 46.8 | 43.8 | 51.2 | IFC (ResNet-50) | 2021-06-07 |
End-to-End Video Instance Segmentation with Transformers | ✓ Link | 40.1 | 64.0 | 45.0 | 38.3 | 44.9 | VisTR(ResNet-101) | 2020-11-30 |
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection | ✓ Link | 39.0 | | | | | VSTAM | 2022-06-17 |
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation | ✓ Link | 36.8 | 56.8 | 38.0 | 34.8 | 41.8 | STMask(R101-DCN-FPN) | 2021-04-06 |
STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation | | 36.7 | 57.2 | 38.6 | 36.9 | 44.5 | STC (ResNet-50) | 2022-02-08 |
Crossover Learning for Fast Online Video Instance Segmentation | ✓ Link | 36.6 | 57.3 | 39.7 | 36 | 42 | CrossVIS (ResNet-101) | 2021-04-13 |
End-to-End Video Instance Segmentation with Transformers | ✓ Link | 36.2 | 59.8 | 36.9 | 37.2 | 42.4 | VisTR(ResNet-50) | 2020-11-30 |
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation | ✓ Link | 36.1 | 54.9 | 39.4 | 36.3 | 41.6 | PCAN(ResNet-50) | 2021-06-22 |
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation | ✓ Link | 36.0 | 59.4 | 39.2 | 39.1 | 47.7 | ObjProp (ResNet-50) | 2021-11-15 |
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation | ✓ Link | 35.3 | 56.0 | 38.6 | 33.1 | 40.3 | CompFeat(ResNet-50) | 2020-12-07 |
Occluded Video Instance Segmentation: A Benchmark | ✓ Link | 35.1 | 55.6 | 38.1 | | | CSipMask | 2021-02-02 |
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos | ✓ Link | 34.6 | 55.8 | 37.9 | 34.4 | 41.6 | STEm-Seg (ResNet-101) | 2020-03-18 |
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation | ✓ Link | 33.7 | 54.1 | 35.8 | 35.4 | 40.1 | SipMask (ResNet-50, ms-train, single-scale test) | 2020-07-29 |
Track to Detect and Segment: An Online Multi-Object Tracker | ✓ Link | 32.6 | 52.6 | 32.8 | | | TraDeS | 2021-03-16 |
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation | ✓ Link | 32.5 | 53 | 33.3 | 33.5 | 38.9 | SipMask (ResNet-50, single-scale test) | 2020-07-29 |
Occluded Video Instance Segmentation: A Benchmark | ✓ Link | 32.1 | 52.8 | 34.9 | | | CMaskTrack R-CNN | 2021-02-02 |
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos | ✓ Link | 30.6 | 50.7 | 37.9 | 34.4 | 41.6 | STEm-Seg (ResNet-50) | 2020-03-18 |
Video Instance Segmentation | ✓ Link | 30.3 | 51.1 | 32.6 | 31 | 35.5 | MaskTrack R-CNN (ResNet-50, single-scale training and test) | 2019-05-12 |
Do Different Tracking Tasks Require Different Appearance Models? | ✓ Link | 30.1 | | | | | UniTrack | 2021-07-05 |
Efficient Video Object Segmentation via Network Modulation | ✓ Link | 29.1 | 28.6 | 33.1 | | | OSMN | 2018-02-04 |
Simple Online and Realtime Tracking with a Deep Association Metric | ✓ Link | 27.8 | 31.3 | | | | DeepSORT | 2017-03-21 |