SAM 2: Segment Anything in Images and Videos | ✓ Link | 90.7 | | | | | | | | 224.4 | SAM2 | 2024-08-01 |
Putting the Object Back into Video Object Segmentation | ✓ Link | 90.5 | 87.5 | | | 93.4 | | | | 17.9 | Cutie+ (base) | 2023-10-19 |
Look Before You Match: Instance Understanding Matters in Video Object Segmentation | | 89.8 | 86.7 | | | 93.0 | | | | | ISVOS (BL30K, MS) | 2022-12-13 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 89.5 | 86.3 | | | 92.6 | | | | | XMem (BL30K, MS) | 2022-07-14 |
Look Before You Match: Instance Understanding Matters in Video Object Segmentation | | 88.6 | 85.8 | 91.4 | | | | | | | ISVOS (MS) | 2022-12-13 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 88.2 | 85.4 | | | 91.0 | | | | | XMem (MS) | 2022-07-14 |
Look Before You Match: Instance Understanding Matters in Video Object Segmentation | | 88.2 | 84.5 | | | 91.9 | | | | | ISVOS (BL30K) | 2022-12-13 |
Putting the Object Back into Video Object Segmentation | ✓ Link | 88.1 | 85.5 | | | 90.8 | | | 17.9 | | Cutie+ (base, MEGA) | 2023-10-19 |
Memory Matching is not Enough: Jointly Improving Memory Matching and Decoding for Video Object Segmentation | | 88.1 | 85.2 | | | 91.0 | | | | | JIMD | 2024-09-22 |
Putting the Object Back into Video Object Segmentation | ✓ Link | 87.9 | 84.6 | | | 91.1 | | | | 36.4 | Cutie (base) | 2023-10-19 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 87.7 | 84.0 | | | 91.4 | | | 22.6 | | XMem (BL30K) | 2022-07-14 |
Tracking Anything with Decoupled Video Segmentation | ✓ Link | 87.6 | 84.2 | | | 91.0 | | | 25.3 | | DEVA | 2023-09-07 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 87.0 | 84.2 | | | 89.8 | | | 1.3 | 65.6 | SwinB-AOTv2-L (MS) | 2022-03-22 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 86.7 | 83.8 | | | 89.5 | | | 1.3 | 65.6 | SwinB-AOST (L'=3, MS) | 2022-03-22 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 86.3 | 83.1 | | | 89.4 | | | 12.0 | 65.6 | SwinB-AOTv2-L | 2022-03-22 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 86.2 | 83.1 | | | 89.2 | | | 15.4 | 70.3 | SwinB-DeAOT-L | 2022-10-18 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 86.2 | 82.9 | | | 89.5 | | | 22.6 | | XMem | 2022-07-14 |
Region Aware Video Object Segmentation with Deep Motion Modeling | | 86.1 | 82.9 | | | 89.3 | | | 42 (on 3090) | | RAVOS | 2022-07-21 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 85.6 | 82.6 | | | 88.5 | | | 17.5 | 15.4 | R50-AOST (L'=3) | 2022-03-22 |
Learning Quality-aware Dynamic Memory for Video Object Segmentation | ✓ Link | 85.6 | 82.5 | | | 88.6 | | | | | QDMN | 2022-07-16 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 85.4 | 82.4 | | | 88.4 | | | 12.1 | 65.4 | SwinB-AOT-L | 2021-06-04 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 85.3 | 82.5 | | | 88.0 | | | 24.3 | 13.9 | R50-AOST (L'=2) | 2022-03-22 |
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation | ✓ Link | 85.3 | 82.0 | 91.3 | 6.2 | 88.6 | 94.6 | 85.3 | 20.2 | | STCN | 2021-06-09 |
TarViS: A Unified Approach for Target-based Video Segmentation | ✓ Link | 85.3 | 81.7 | | | 88.5 | | | | | TarViS | 2023-01-06 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 85.2 | 82.2 | | | 88.2 | | | 27.0 | 19.8 | R50-DeAOT-L | 2022-10-18 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 84.9 | 82.3 | | | 87.5 | | | 18.0 | 14.9 | R50-AOT-L | 2021-06-04 |
Hierarchical Memory Matching Network for Video Object Segmentation | ✓ Link | 84.7 | 81.9 | | | 87.5 | | | | | HMMN | 2021-09-23 |
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion | ✓ Link | 84.5 | 81.7 | 90.9 | 7.0 | 87.4 | 93.1 | 8.2 | 11.2 | | MiVOS | 2021-03-14 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 84.5 | 81.4 | | | 87.6 | | | 22.6 | | XMem (DAVIS and YouTubeVOS only) | 2022-07-14 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 84.1 | 81.0 | | | 87.1 | | | 28.5 | 13.2 | DeAOT-L | 2022-10-18 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 83.8 | 81.1 | | | 86.4 | | | 18.7 | 8.3 | AOT-L | 2021-06-04 |
Reliable Propagation-Correction Modulation for Video Object Segmentation | ✓ Link | 83.7 | 81.3 | | | 86 | | | | | RPCMVOS | 2021-12-06 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 83.7 | 81.2 | | | 86.1 | | | 37.4 | 12.5 | R50-AOST (L'=1) | 2022-03-22 |
Efficient Regional Memory Network for Video Object Segmentation | ✓ Link | 83.5 | 81.0 | | | 86.0 | | | | | RMNet | 2021-03-24 |
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration | ✓ Link | 82.9 | 80.1 | | | 85.7 | | | | | CFBI+ | 2020-10-13 |
Kernelized Memory Network for Video Object Segmentation | ✓ Link | 82.8 | 80 | | | 85.6 | | | | | KMN | 2020-07-16 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 82.5 | 79.7 | | | 85.2 | | | 29.6 | 8.3 | AOT-B | 2021-06-04 |
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation | | 82.3 | | | | 88.9 | | | 90.6 | 8.1 | MobileVOS (BL30K) | 2023-03-14 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 82.2 | 79.2 | | | 85.1 | | | 40.9 | 13.2 | DeAOT-B | 2022-10-18 |
Collaborative Video Object Segmentation by Foreground-Background Integration | ✓ Link | 81.9 | 79.1 | | | 84.6 | | | | | CFBI | 2020-03-18 |
Video Object Segmentation using Space-Time Memory Networks | ✓ Link | 81.75 | 79.2 | 88.7 | 8.0 | 84.3 | 91.8 | 10.5 | | | STM | 2019-04-01 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 81.3 | 78.7 | | | 83.9 | | | 40.0 | 7.0 | AOT-S | 2021-06-04 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 80.8 | 77.8 | | | 83.8 | | | 49.2 | 10.2 | DeAOT-S | 2022-10-18 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 80.5 | 77.7 | | | 83.3 | | | 63.5 | 7.2 | DeAOT-T | 2022-10-18 |
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation | | 80.2 | | | | 87.1 | | | 90.6 | 8.1 | MobileVOS | 2023-03-14 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 79.9 | 77.4 | | | 82.3 | | | 51.4 | 5.7 | AOT-T | 2021-06-04 |
Joint Inductive and Transductive Learning for Video Object Segmentation | ✓ Link | 78.6 | 76.0 | | | 81.2 | | | | | JOINT | 2021-08-08 |
PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation | ✓ Link | 77.85 | 73.9 | 83.1 | 16.2 | 81.8 | 88.9 | 19.5 | | | PReMVOS | 2018-07-24 |
Separable Structure Modeling for Semi-supervised Video Object Segmentation | ✓ Link | 77.6 | 75.3 | | 11.7 | 79.9 | | 15.3 | 22.3 | | SSM-VOS | 2021-02-18 |
LSMVOS: Long-Short-Term Similarity Matching for Video Object | ✓ Link | 77.4 | 73.9 | 83.6 | 12.9 | 80.8 | 91.3 | 15.7 | | | LSMVOS | 2020-09-02 |
SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization | ✓ Link | 77.2 | 74.5 | | | 79.8 | | | | | SWEM | 2022-08-22 |
Make One-Shot Video Object Segmentation Efficient Again | ✓ Link | 77.2 | 74.4 | | 13.0 | 80.0 | | | | | e-OSVOS | 2020-12-03 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 76.7 | 74.1 | | | 79.3 | | | 22.6 | | XMem (DAVIS only) | 2022-07-14 |
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation | ✓ Link | 76.15 | 73.4 | 83.5 | 17.8 | 78.9 | 87.2 | 19.1 | | | MHP-VOS | 2019-04-17 |
Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation | ✓ Link | 74.65 | 71.6 | | | 77.7 | | | | | PTSNet | 2019-07-02 |
Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement | ✓ Link | 74.6 | 73.0 | 85.3 | 13.8 | 76.1 | 87.0 | 15.5 | | | AFB-URR | 2020-10-15 |
A Transductive Approach for Video Object Segmentation | ✓ Link | 72.3 | 69.9 | | | 74.7 | | | | | TVOS | 2020-04-15 |
A Generative Appearance Model for End-to-end Video Object Segmentation | ✓ Link | 71.05 | 68.5 | 78.4 | 14.0 | 73.6 | 83.4 | 15.8 | | | AGAME | 2018-11-28 |
CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF | | 70.6 | 67.2 | 74.5 | 24.6 | 74.0 | 81.6 | 26.2 | | | CINM | 2018-03-26 |
Siam R-CNN: Visual Tracking by Re-Detection | ✓ Link | 70.55 | 66.1 | 74.8 | 15.8 | 75.0 | 82.8 | 16.2 | | | Siam R-CNN | 2019-11-28 |
Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation | ✓ Link | 69.7 | 68.3 | | | 71.2 | | | | | MAMP | 2021-07-27 |
Dense Unsupervised Learning for Video Segmentation | ✓ Link | 69.4 | 67.1 | 80.9 | | 71.7 | 84.8 | | | | Araslanov et al. | 2021-11-11 |
Video Object Segmentation Without Temporal Information | | 68 | 64.7 | 74.2 | 15.1 | 71.3 | 80.7 | 18.5 | | | OSVOS-S | 2017-09-18 |
Fast Video Object Segmentation by Reference-Guided Mask Propagation | ✓ Link | 66.7 | 64.8 | 74.1 | 18.9 | 68.6 | 77.7 | 19.6 | | | RGMP | 2018-06-01 |
AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation | ✓ Link | 66.6 | 63.4 | | | 69.8 | | | | | AGSS-VOS | 2019-10-01 |
RANet: Ranking Attention Network for Fast Video Object Segmentation | ✓ Link | 65.7 | 63.2 | 73.7 | 18.6 | 68.2 | 78.8 | 19.7 | | | RANet | 2019-08-19 |
MAST: A Memory-Augmented Self-supervised Tracker | ✓ Link | 65.5 | 63.3 | 73.2 | | 67.6 | 77.7 | | | | MAST | 2020-02-18 |
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation | | 65.35 | 61.6 | 67.4 | 27.9 | 69.1 | 75.4 | 26.6 | | | OnAVOS | 2017-06-28 |
VideoMatch: Matching based Video Object Segmentation | | 62.4 | 56.5 | | | 68.2 | | | | | VideoMatch | 2018-09-04 |
Spatiotemporal CNN for Video Object Segmentation | ✓ Link | 61.65 | 58.7 | | | 64.6 | | | | | Spatiotemporal CNN | 2019-04-04 |
Video Object Segmentation with Language Referring Expressions | | 60.8 | 58.0 | | | | | | | | VOSwL (Language) | 2018-03-21 |
RVOS: End-to-End Recurrent Network for Video Object Segmentation | ✓ Link | 60.55 | 57.5 | 65.2 | 24.9 | 63.6 | 73.2 | 28.2 | | | RVOS | 2019-03-13 |
One-Shot Video Object Segmentation | ✓ Link | 60.25 | 56.6 | 63.8 | 26.1 | 63.9 | 73.8 | 27.0 | | | OSVOS | 2016-11-16 |
Joint-task Self-supervised Learning for Temporal Correspondence | ✓ Link | 59.5 | 57.7 | 68.3 | | 61.3 | 69.8 | | | | UVC | 2019-09-26 |
Fast and Accurate Online Video Object Segmentation via Tracking Parts | ✓ Link | 58.2 | 54.6 | 61.1 | 14.1 | 61.8 | 72.3 | 18.0 | | | FAVOS | 2018-06-06 |
Fast Online Object Tracking and Segmentation: A Unifying Approach | ✓ Link | 56.4 | 54.3 | 62.8 | 19.3 | 58.5 | 67.5 | 20.9 | | | SiamMask | 2018-12-12 |
Learning Video Object Segmentation from Unlabeled Videos | ✓ Link | 56.05 | 54.1 | 60.5 | 32.5 | 58.0 | 62.2 | 37.4 | | | MuG-W | 2020-03-10 |
Efficient Video Object Segmentation via Network Modulation | ✓ Link | 54.8 | 52.5 | 60.9 | 21.5 | 57.1 | 66.1 | 24.3 | | | OSMN | 2018-02-04 |
Self-supervised Learning for Video Correspondence Flow | ✓ Link | 50.3 | 48.4 | 53.2 | | 52.2 | 56.0 | | | | CorrFlow | 2019-05-02 |
Learning Correspondence from the Cycle-Consistency of Time | ✓ Link | 48.7 | 46.4 | 50.0 | | 50.0 | 48.0 | | | | CycleTime | 2019-03-18 |
Video Object Segmentation with Language Referring Expressions | | | | 66.1 | 22.4 | 63.5 | 70.4 | 24.5 | | | VOSwL | 2018-03-21 |