Dynamic Scene Understanding from Vision-Language Representations | | 46.49 | | | | | Ours (PViC+) | 2025-01-20 |
RLIPv2: Fast Scaling of Relational Language-Image Pre-training | ✓ Link | 45.09 | | | | | RLIPv2 (Swin-L) | 2023-08-18 |
Exploring Predicate Visual Context in Detecting Human-Object Interactions | ✓ Link | 44.32 | | | | | PViC-SwinL | 2023-08-11 |
Focusing on what to decode and what to train: SOV Decoding with Specific Target Guided DeNoising and Vision Language Advisor | ✓ Link | 43.35 | | | | | SOV-STG (Swin-L) | 2023-07-05 |
Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model | ✓ Link | 41.50 | | | | | DiffHOI | 2023-05-20 |
ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection | ✓ Link | 37.22 | | | | | ViPLO | 2023-04-17 |
FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection | ✓ Link | 37.18 | | | | | FGAHOI | 2023-01-08 |
ERNet: Efficient and Reliable Human-Object Interaction Detection | ✓ Link | 36.89 | | | | | ERNet | 2023-01-26 |
Category Query Learning for Human-Object Interaction Classification | ✓ Link | 36.03 | | | | | CQL+GEN-VLKT-L | 2023-03-24 |
QAHOI: Query-Based Anchors for Human-Object Interaction Detection | ✓ Link | 35.78 | | | | | QAHOI (Swin-L) | 2021-12-16 |
Category Query Learning for Human-Object Interaction Classification | ✓ Link | 35.36 | | | | | CQL+GEN-VLKT-B | 2023-03-24 |
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection | ✓ Link | 35.15 | | | | | Body Part Interactiveness | 2022-07-28 |
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection | ✓ Link | 34.95 | | | | | GEN-VLKT-R101 | 2022-03-26 |
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection | ✓ Link | 34.84 | | 34.84 | 34.94 | 34.52 | HOIGen | 2024-08-12 |
Exploring Predicate Visual Context in Detecting Human-Object Interactions | ✓ Link | 34.69 | | | | | PViC-R50 | 2023-08-11 |
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models | ✓ Link | 34.69 | | | | | HOICLIP | 2023-03-28 |
Relational Context Learning for Human-Object Interaction Detection | ✓ Link | 32.87 | | | | | MUREN | 2023-04-11 |
RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection | ✓ Link | 32.84 | | | | | RLIP-ParSe (ResNet-50) | 2022-09-05 |
RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection | ✓ Link | 32.76 | | | | | ParSe (ResNet-101) | 2022-09-05 |
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer | ✓ Link | 32.62 | 124 | | | | UPT-R101-DC5 | 2021-12-03 |
The Overlooked Classifier in Human-Object Interaction Recognition | | 32.35 | | | | | DEFR | 2021-12-13 |
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer | ✓ Link | 32.31 | 61 | | | | UPT-R101 | 2021-12-03 |
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection | ✓ Link | 32.22 | 74 | | | | STIP (ResNet-50) | 2022-06-13 |
Mining the Benefits of Two-stage and One-stage HOI Detection | ✓ Link | 32.07 | | | | | CDN (ResNet101) | 2021-08-11 |
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer | ✓ Link | 31.66 | 42 | | | | UPT-R50 | 2021-12-03 |
Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics | ✓ Link | 31.43 | | | | | OCN (ResNet101) | 2022-02-01 |
QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information | ✓ Link | 29.90 | 63 | | | | QPIC (ResNet101) | 2021-03-09 |
Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection | ✓ Link | 29.63 | | | | | QPIC + CPC | 2022-04-11 |
Spatially Conditioned Graphs for Detecting Human-Object Interactions | ✓ Link | 29.26 | | | | | SCG (DETR-R101) | 2020-12-11 |
QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information | ✓ Link | 29.07 | 46 | | | | QPIC (ResNet50) | 2021-03-09 |
Reformulating HOI Detection as Adaptive Set Prediction | ✓ Link | 28.87 | 71 | | | | AS-Net (ResNet50) | 2021-03-10 |
End-to-End Human Object Interaction Detection with HOI Transformer | ✓ Link | 26.61 | | | | | HOITrans(ResNet101) | 2021-03-08 |
HOI Analysis: Integrating and Decomposing Human-Object Interaction | ✓ Link | 26.29 | | | | | IDN (finetuned detector) | 2020-10-30 |
Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection | ✓ Link | 26.16 | | | | | HOTR + CPC | 2022-04-11 |
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection | ✓ Link | 25.94 | | | | | ConsNet-F (ResNet-50) | 2020-08-14 |
DRG: Dual Relation Graph for Human-Object Interaction Detection | ✓ Link | 24.53 | | | | | DRG | 2020-08-26 |
End-to-End Human Object Interaction Detection with HOI Transformer | ✓ Link | 23.46 | | | | | HOITrans(ResNet50) | 2021-03-08 |
HOTR: End-to-End Human-Object Interaction Detection with Transformers | ✓ Link | 23.46 | | | | | HOTR | 2021-04-28 |
HOI Analysis: Integrating and Decomposing Human-Object Interaction | ✓ Link | 23.36 | | | | | IDN (COCO detector) | 2020-10-30 |
PaStaNet: Toward Human Activity Knowledge Engine | ✓ Link | 22.65 | | | | | PaStaNet | 2020-04-02 |
Polysemy Deciphering Network for Robust Human-Object Interaction Detection | ✓ Link | 22.37 | | | | | PD-Net | 2020-08-07 |
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection | ✓ Link | 22.15 | | | | | ConsNet (ResNet-50) | 2020-08-14 |
ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection | ✓ Link | 22.11 | | | | | ACP++ | 2021-09-09 |
PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection | ✓ Link | 21.92 | 71 | | | | PPDM | 2019-12-30 |
DIRV: Dense Interaction Region Voting for End-to-End Human-Object Interaction Detection | ✓ Link | 21.81 | 68 | | | | DIRV | 2020-10-02 |
Detailed 2D-3D Joint Representation for Human-Object Interaction | ✓ Link | 21.34 | | | | | DJ-RN | 2020-04-17 |
Pose-based Modular Network for Human-Object Interaction Detection | ✓ Link | 21.21 | | | | | PMN | 2020-08-05 |
Transferable Interactiveness Knowledge for Human-Object Interaction Detection | ✓ Link | 20.93 | | | | | TIN (TIPAMI) | 2021-01-25 |
Detecting Human-Object Interactions with Action Co-occurrence Priors | ✓ Link | 20.59 | | | | | ACP | |
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions | ✓ Link | 19.8 | | | | | VSGNet | 2020-03-11 |
Transferable Interactiveness Knowledge for Human-Object Interaction Detection | ✓ Link | 17.54 | 512 | | | | TIN (Interactiveness) | 2018-11-20 |
Transferable Interactiveness Knowledge for Human-Object Interaction Detection | ✓ Link | 17.22 | | | | | TIN (CVPR) | 2018-11-20 |
iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection | ✓ Link | 14.84 | | | | | iCAN | 2018-08-30 |
Learning Human-Object Interactions by Graph Parsing Neural Networks | ✓ Link | 13.11 | | | | | GPNN | 2018-08-23 |
Detecting and Recognizing Human-Object Interactions | ✓ Link | 9.94 | 145 | | | | InteractNet | 2017-04-24 |