OpenCodePapers

video-instance-segmentation-on-youtube-vis-1

Video Instance Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemask APAP50AP75AR1AR10ModelNameReleaseDate
Context-Aware Video Instance Segmentation✓ Link68.989.376.258.373.6CAVIS(ViT-L, Online)2024-07-03
DVIS++: Improved Decoupled Framework for Universal Video Segmentation✓ Link67.788.875.357.973.7DVIS++(ViT-L, Online)2023-12-20
DVIS: Decoupled Video Instance Segmentation Framework✓ Link64.988.072.756.570.3DVIS2023-06-06
Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation✓ Link64.686.671.355.969.1Tube-Link2023-03-22
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training✓ Link61.683.368.654.866.6MinVIS (Swin-L)2022-08-03
Mask2Former for Video Instance Segmentation✓ Link60.484.467.0Mask2Former (Swin-L)2021-12-20
UniVS: Unified and Universal Video Segmentation with Prompts as Queries✓ Link60.082.165.354.766.8UniVS(Swin-L)2024-02-28
MDQE: Mining Discriminative Query Embeddings to Segment Occluded Instances on Challenging Videos✓ Link59.984.967.353.565.0MDQE(Swin-L)2023-03-25
SeqFormer: Sequential Transformer for Video Instance Segmentation✓ Link59.382.166.451.764.4SeqFormer (Swin-L)2021-12-15
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation✓ Link57.180.866.350.861.0DeVIS (Swin-L)2022-07-22
InstanceFormer: An Online Video Instance Segmentation Framework✓ Link56.378.064.250.961.6InstanceFormer(Swin-L)2022-08-22
1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation54.376.665.64757.9TCIS (Swin-S)2021-06-12
Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation✓ Link54.179.059.649.759.9Video K-Net (Swin-Base)2022-04-10
NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation52.875.756.950.360.6NOVIS (ResNet-50)2023-08-29
In Defense of Online Models for Video Instance Segmentation✓ Link49.57452.947.758.7IDOL (ResNet-50)2022-07-21
Mask2Former for Video Instance Segmentation✓ Link49.272.854.2Mask2Former (ResNet-101)2021-12-20
SeqFormer: Sequential Transformer for Video Instance Segmentation✓ Link49.071.155.746.856.9SeqFormer (ResNet-101)2021-12-15
MSN: Efficient Online Mask Selection Network for Video Instance Segmentation✓ Link48.869.454.940.155.0MSN2021-06-19
SeqFormer: Sequential Transformer for Video Instance Segmentation✓ Link47.469.851.845.554.8SeqFormer (ResNet-50)2021-12-15
Mask2Former for Video Instance Segmentation✓ Link46.468.050.0Mask2Former (ResNet-50)2021-12-20
InstanceFormer: An Online Video Instance Segmentation Framework✓ Link45.668.649.642.153.5InstanceFormer(ResNet-50)2022-08-22
SeqFormer: Sequential Transformer for Video Instance Segmentation✓ Link45.166.950.545.654.6SeqFormer (ResNet-50)2021-12-15
DeVIS: Making Deformable Transformers Work for Video Instance Segmentation✓ Link44.466.748.642.451.6DeVIS (ResNet-50)2022-07-22
Video Instance Segmentation using Inter-Frame Communication Transformers✓ Link42.865.846.843.851.2IFC (ResNet-50)2021-06-07
End-to-End Video Instance Segmentation with Transformers✓ Link40.164.045.038.344.9VisTR(ResNet-101)2020-11-30
Video Sparse Transformer With Attention-Guided Memory for Video Object Detection✓ Link39.0VSTAM2022-06-17
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation✓ Link36.856.838.034.841.8STMask(R101-DCN-FPN)2021-04-06
STC: Spatio-Temporal Contrastive Learning for Video Instance Segmentation36.757.238.636.944.5STC (ResNet-50)2022-02-08
Crossover Learning for Fast Online Video Instance Segmentation✓ Link36.657.339.73642CrossVIS (ResNet-101)2021-04-13
End-to-End Video Instance Segmentation with Transformers✓ Link36.259.836.937.242.4VisTR(ResNet-50)2020-11-30
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation✓ Link36.154.939.436.341.6PCAN(ResNet-50)2021-06-22
Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation✓ Link36.059.439.239.147.7ObjProp (ResNet-50)2021-11-15
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation✓ Link35.356.038.633.140.3CompFeat(ResNet-50)2020-12-07
Occluded Video Instance Segmentation: A Benchmark✓ Link35.155.638.1CSipMask2021-02-02
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos✓ Link34.655.837.934.441.6STEm-Seg (ResNet-101)2020-03-18
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation✓ Link33.754.135.835.440.1SipMask (ResNet-50, ms-train, single-scale test)2020-07-29
Track to Detect and Segment: An Online Multi-Object Tracker✓ Link32.652.632.8TraDeS2021-03-16
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation✓ Link32.55333.333.538.9SipMask (ResNet-50, single-scale test)2020-07-29
Occluded Video Instance Segmentation: A Benchmark✓ Link32.152.834.9CMaskTrack R-CNN2021-02-02
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos✓ Link30.650.737.934.441.6STEm-Seg (ResNet-50)2020-03-18
Video Instance Segmentation✓ Link30.351.132.63135.5MaskTrack R-CNN (ResNet-50, single-scale training and test)2019-05-12
Do Different Tracking Tasks Require Different Appearance Models?✓ Link30.1UniTrack2021-07-05
Efficient Video Object Segmentation via Network Modulation✓ Link29.128.633.1OSMN2018-02-04
Simple Online and Realtime Tracking with a Deep Association Metric✓ Link27.831.3DeepSORT2017-03-21