Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 67.5 | | | | MAVS [DBLP:conf/mm/FengLKZ18] | 2021-04-23 |
CLIP-It! Language-Guided Video Summarization | ✓ Link | 66.3 | 69.0 | 0.108 | 0.147 | CLIP-It | 2021-07-01 |
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 63.9 | | | | re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18] | 2021-04-23 |
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 63.7 | | | | MC-VSA [DBLP:journals/corr/abs-2006-01410] | 2021-04-23 |
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer | | 63.4 | 64.2 | 0.134 | 0.163 | iPTNet | 2022-01-01 |
Align and Attend: Multimodal Summarization with Dual Contrastive Losses | ✓ Link | 63.4 | | 0.137 | 0.165 | A2Summ | 2023-03-13 |
Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization | | 63.0 | 63.6 | 0.162 | 0.212 | RR-STG | 2022-04-06 |
Combining Global and Local Attention with Positional Encoding for Video Summarization | ✓ Link | 62.7 | | | | PGL-SUM (maximum learning capacity) | 2021-12-01 |
DSNet: A Flexible Detect-to-Summarize Network for Video Summarization | ✓ Link | 62.1 | 63.9 | | | DSNet | 2020-12-01 |
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 61.5 | | 0.190 | 0.210 | MSVA | 2021-04-23 |
Query Twice: Dual Mixture Attention Meta Learning for Video Summarization | | 61.4 | | 0.203 | 0.267 | DMASum | 2020-08-19 |
Combining Global and Local Attention with Positional Encoding for Video Summarization | ✓ Link | 61.0 | | 0.157 | 0.206 | PGL-SUM | 2021-12-01 |
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 61 | | | | M-AVS [DBLP:journals/corr/abs-1708-09545] | 2021-04-23 |
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization | ✓ Link | 60.9 | 61.9 | 0.097 | 0.105 | VJMHT | 2021-12-27 |
Progressive Video Summarization via Multimodal Self-supervised Learning | ✓ Link | 60.4 | | 0.181 | 0.238 | SSPVS(+Text) | 2022-01-07 |
Progressive Video Summarization via Multimodal Self-supervised Learning | ✓ Link | 60.3 | 61.8 | 0.177 | 0.233 | SSPVS | 2022-01-07 |
Hierarchical Multimodal Transformer to Summarize Videos | | 60.1 | 60.3 | 0.096 | 0.107 | HMT | 2021-09-22 |
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention | ✓ Link | 59.8 | | | | VASNet [DBLP:conf/accv/FajtlSAMR18] | 2021-04-23 |
Discriminative Feature Learning for Unsupervised Video Summarization | ✓ Link | 58.5 | 57.1 | | | CSNet | 2018-11-24 |
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward | ✓ Link | 58.1 | 59.8 | | | DR-DSN | 2017-12-29 |
CSTA: CNN-based Spatiotemporal Attention for Video Summarization | ✓ Link | | | 0.194 | 0.255 | CSTA | 2024-05-20 |