Look Before You Match: Instance Understanding Matters in Video Object Segmentation | | 93.4 | 92.5 | | | 94.2 | | | | ISVOS (BL30K, MS) | 2022-12-13 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 93.3 | 92.2 | | | 94.4 | | | | XMem (BL30K, MS) | 2022-07-14 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 93.0 | 91.6 | | | 94.4 | | | 1.3 | SwinB-AOTv2-L (MS) | 2022-03-22 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 93.0 | 91.5 | | | 94.5 | | | 1.3 | SwinB-AOST (L'=3, MS) | 2022-03-22 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 92.9 | 91.1 | | | 94.7 | | | 15.4 | SwinB-DeAOT-L | 2022-10-18 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 92.7 | 92.0 | | | 93.5 | | | | XMem (MS) | 2022-07-14 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 92.4 | 90.6 | | | 94.1 | | | 12.0 | SwinB-AOTv2-L | 2022-03-22 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 92.4 | 90.5 | | | 94.2 | | | 12.0 | SwinB-AOST (L'=3) | 2022-03-22 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 92.3 | 90.5 | | | 94.0 | | | 27.0 | R50-DeAOT-L | 2022-10-18 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 92.1 | 90.6 | | | 93.6 | | | 17.5 | R50-AOST (L'=3) | 2022-03-22 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 92.0 | 90.7 | | | 93.3 | | | 12.1 | SwinB-AOT-L | 2021-06-04 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 92.0 | 90.7 | | | 93.2 | | | 29.6 | XMem (BL30K) | 2022-07-14 |
Learning Quality-aware Dynamic Memory for Video Object Segmentation | ✓ Link | 92.0 | 90.7 | | | 93.2 | | | | QDMN | 2022-07-16 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 92.0 | 90.5 | | | 93.4 | | | 24.3 | R50-AOST (L'=2) | 2022-03-22 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 92.0 | 90.3 | | | 93.7 | | | 28.5 | DeAOT-L | 2022-10-18 |
TrickVOS: A Bag of Tricks for Video Object Segmentation | | 91.8 | 90.5 | | | 93.1 | | | | STCN + TrickVOS (PT) | 2023-06-27 |
Region Aware Video Object Segmentation with Deep Motion Modeling | | 91.7 | 90.8 | | | 92.6 | | | 58 | RAVOS | 2022-07-21 |
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation | ✓ Link | 91.7 | 90.4 | 98.1 | 4.1 | 93.0 | 97.1 | 4.3 | 26.9 | STCN | 2021-06-09 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 91.5 | 90.4 | | | 92.7 | | | 29.6 | XMem | 2022-07-14 |
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation | | 91.4 | 90.3 | | | 92.6 | | | 100.1 | MobileVOS (BL30K) | 2023-03-14 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 91.1 | 90.1 | | | 92.1 | | | 18.0 | R50-AOT-L | 2021-06-04 |
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion | ✓ Link | 91.0 | 89.7 | 97.5 | 6.6 | 92.4 | 96.4 | 5.1 | 16.9 | MiVOS | 2021-03-14 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 91.0 | 89.4 | | | 92.5 | | | 40.9 | DeAOT-B | 2022-10-18 |
Hierarchical Memory Matching Network for Video Object Segmentation | ✓ Link | 90.8 | 89.6 | | | 92.0 | | | | HMMN | 2021-09-23 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 90.8 | 89.6 | | | 91.9 | | | 29.6 | XMem (DAVIS+YouTubeVOS only) | 2022-07-14 |
MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation | | 90.6 | 89.7 | | | 91.6 | | | 100.1 | MobileVOS | 2023-03-14 |
Reliable Propagation-Correction Modulation for Video Object Segmentation | ✓ Link | 90.6 | 87.1 | | | 94 | | | | RPCMVOS | 2021-12-06 |
Kernelized Memory Network for Video Object Segmentation | ✓ Link | 90.5 | 89.5 | | | 91.5 | | | | KMN | 2020-07-16 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 90.4 | 89.6 | | | 91.1 | | | 18.7 | AOT-L | 2021-06-04 |
Scalable Video Object Segmentation with Identification Mechanism | ✓ Link | 90.3 | 89.6 | | | 90.9 | | | 37.4 | R50-AOST (L'=1) | 2022-03-22 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 89.9 | 88.7 | | | 91.1 | | | 29.6 | AOT-L | 2021-06-04 |
Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration | ✓ Link | 89.9 | 88.7 | | | 91.1 | | | | CFBI+ | 2020-10-13 |
Video Object Segmentation using Space-Time Memory Networks | ✓ Link | 89.4 | 88.7 | 97.4 | 5.0 | 90.1 | 95.2 | 4.2 | | STM | 2019-04-01 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 89.4 | 88.6 | | | 90.2 | | | 40.0 | AOT-S | 2021-06-04 |
Collaborative Video Object Segmentation by Foreground-Background Integration | ✓ Link | 89.4 | 88.3 | | | 90.5 | | | | CFBI | 2020-03-18 |
TrickVOS: A Bag of Tricks for Video Object Segmentation | | 89.3 | 88.7 | | | 89.9 | | | 86.4 | Lightweight TrickVOS (PT) | 2023-06-27 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 89.3 | 87.6 | | | 90.9 | | | 49.2 | DeAOT-S | 2022-10-18 |
Decoupling Features in Hierarchical Propagation for Video Object Segmentation | ✓ Link | 88.9 | 87.8 | | | 89.9 | | | 63.5 | DeAOT-T | 2022-10-18 |
Efficient Regional Memory Network for Video Object Segmentation | ✓ Link | 88.8 | 88.9 | | | 88.7 | | | | RMNet | 2021-03-24 |
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation | ✓ Link | 88.55 | 87.6 | 97.3 | 6.9 | 89.5 | 95.5 | 9.0 | | MHP-VOS | 2019-04-17 |
SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization | ✓ Link | 88.1 | 87.3 | | | 89.0 | | | 36 | SWEM (val) | 2022-08-22 |
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | ✓ Link | 87.8 | 86.7 | | | 88.9 | | | 29.6 | XMem (DAVIS only) | 2022-07-14 |
RANet: Ranking Attention Network for Fast Video Object Segmentation | ✓ Link | 87.1 | 86.6 | 97 | 7.4 | 87.6 | 96.1 | 8.2 | | RANet+ (online learning) | 2019-08-19 |
Make One-Shot Video Object Segmentation Efficient Again | ✓ Link | 86.8 | 86.6 | | 4.5 | 87.0 | | | | e-OSVOS | 2020-12-03 |
Associating Objects with Transformers for Video Object Segmentation | ✓ Link | 86.8 | 86.1 | | | 87.4 | | | 51.4 | AOT-T | 2021-06-04 |
PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation | ✓ Link | 86.75 | 84.9 | 96.1 | 8.8 | 88.6 | 94.7 | 9.8 | | PReMVOS | 2018-07-24 |
Video Object Segmentation Without Temporal Information | | 86.55 | 85.6 | 96.8 | 5.5 | 87.5 | 95.9 | 8.2 | | OSVOS-S | 2017-09-18 |
LSMVOS: Long-Short-Term Similarity Matching for Video Object | ✓ Link | 86.5 | 85.7 | 97.1 | 5.1 | 87.3 | 96.1 | 4.9 | | LSMVOS | 2020-09-02 |
Separable Structure Modeling for Semi-supervised Video Object Segmentation | ✓ Link | 85.9 | 86.2 | 97.1 | 5.3 | 85.6 | 92.3 | 5.6 | 36.5 | SSM-VOS | 2021-02-18 |
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation | | 85.5 | 86.1 | 96.1 | 5.2 | 84.9 | 89.7 | 5.8 | | OnAVOS | 2017-06-28 |
RANet: Ranking Attention Network for Fast Video Object Segmentation | ✓ Link | 85.45 | 85.5 | 97.2 | 6.2 | 85.4 | 94.9 | 5.1 | | RANet | 2019-08-19 |
CNN in MRF: Video Object Segmentation via Inference in A CNN-Based Higher-Order Spatio-Temporal MRF | | 84.2 | 83.4 | 94.9 | 12.3 | 85.0 | 92.1 | 14.7 | | CINM | 2018-03-26 |
Spatiotemporal CNN for Video Object Segmentation | ✓ Link | 83.8 | 83.8 | | | 83.8 | | | | Spatiotemporal CNN | 2019-04-04 |
Video Object Segmentation with Language Referring Expressions | | 83.65 | 83.1 | 95.7 | 6.9 | 84.2 | 93.9 | 8.6 | | VOSwL | 2018-03-21 |
Lucid Data Dreaming for Video Object Segmentation | ✓ Link | 82.95 | 83.9 | 95.0 | 9.1 | 82.0 | 88.1 | 9.7 | | Lucid | 2017-03-28 |
A Generative Appearance Model for End-to-end Video Object Segmentation | ✓ Link | 81.85 | 81.5 | 93.6 | 9.4 | 82.2 | 90.3 | 9.8 | | AGAME | 2018-11-28 |
Fast Video Object Segmentation by Reference-Guided Mask Propagation | ✓ Link | 81.75 | 81.5 | 91.7 | 10.9 | 82.0 | 90.8 | 10.1 | | RGMP | 2018-06-01 |
Learning Fast and Robust Target Models for Video Object Segmentation | ✓ Link | 81.7 | | | | | | | 21.9 | FRTM (val) | 2020-02-27 |
FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation | ✓ Link | 81.65 | 81.1 | 90.5 | 13.7 | 82.2 | 86.6 | 14.1 | | FEELVOS | 2019-02-25 |
CRVOS: Clue Refining Network for Video Object Segmentation | ✓ Link | 81.6 | 82.2 | 93.9 | 10.0 | 81.0 | 90.3 | 8.8 | | CRVOS | 2020-02-10 |
Fast and Accurate Online Video Object Segmentation via Tracking Parts | ✓ Link | 80.95 | 82.4 | 96.5 | 4.5 | 79.5 | 89.4 | 5.5 | | FAVOS | 2018-06-06 |
One-Shot Video Object Segmentation | ✓ Link | 80.2 | 79.8 | 93.6 | 14.9 | 80.6 | 92.6 | 15.0 | | OSVOS | 2016-11-16 |
Siam R-CNN: Visual Tracking by Re-Detection | ✓ Link | 78.6 | 76.8 | 86.4 | 2.2 | 80.4 | 87.6 | 4.0 | | Siam R-CNN | 2019-11-28 |
An Efficient 3D CNN for Action/Object Segmentation in Video | | 77.75 | 78.3 | 91.1 | 2.3 | 77.2 | 84.7 | 4.9 | | Hou et al. | 2019-07-21 |
Learning Video Object Segmentation from Static Images | ✓ Link | 77.55 | 79.7 | 93.1 | 8.9 | 75.4 | 87.1 | 9.0 | | MSK | 2016-12-08 |
Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning | | 77.4 | 75.5 | 89.6 | 8.5 | 79.3 | 93.4 | 7.8 | | PML | 2018-04-09 |
SegFlow: Joint Learning for Video Object Segmentation and Optical Flow | ✓ Link | 76.05 | 76.1 | 90.6 | 12.1 | 76.0 | 85.5 | 10.4 | | SFL | 2017-09-20 |
Efficient Video Object Segmentation via Network Modulation | ✓ Link | 73.45 | 74.0 | 87.6 | 9.0 | 72.9 | 84.0 | 10.6 | | OSMN | 2018-02-04 |
Online Video Object Segmentation via Convolutional Trident Network | | 71.4 | 73.5 | 87.4 | 15.6 | 69.3 | 79.6 | 12.9 | | CTN | 2017-07-01 |
Fast Online Object Tracking and Segmentation: A Unifying Approach | ✓ Link | 69.75 | 71.7 | 86.8 | 3.0 | 67.8 | 79.8 | 2.1 | | SiamMask | 2018-12-12 |
Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching | | 68.8 | 68.6 | | | 68.9 | | | | RGMP (val) | 2020-07-11 |
Video Propagation Networks | | 67.9 | 70.2 | 82.3 | 12.4 | 65.6 | 69.0 | 14.4 | | VPN | 2016-12-16 |
Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks | | 66.35 | 70.2 | 86.3 | 11.2 | 62.5 | 73.2 | 14.7 | | PLM | 2017-08-17 |
Video Segmentation via Object Flow | | 65.7 | 68.0 | 75.6 | 26.4 | 63.4 | 70.4 | 27.2 | | OFL | 2016-06-01 |
Learning Video Object Segmentation from Unlabeled Videos | ✓ Link | 64.65 | 65.7 | 77.7 | 26.4 | 63.6 | 67.7 | 27.2 | | MuG-W | 2020-03-10 |
Bilateral Space Video Segmentation | | 59.4 | 60.0 | 66.9 | 28.9 | 58.8 | 67.9 | 21.3 | | BVS | 2016-06-01 |
Fully Connected Object Proposals for Video Segmentation | | 53.8 | 58.4 | 71.5 | -2.0 | 49.2 | 49.5 | -1.1 | | FCP | 2015-12-01 |
A 3D Convolutional Approach to Spectral Object Segmentation in Space and Time | ✓ Link | | 86.3 | | | | | | | SFSeg over OnAVOS | 2019-07-05 |