LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction | ✓ Link | 43.4 | | LaMI-DETR | 2024-07-16 |
Region-centric Image-Language Pretraining for Open-Vocabulary Detection | ✓ Link | 40.4 | 45.8 | DITO | 2023-09-29 |
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision | ✓ Link | 39.3 | | OV-DQUO(ViT-L/14) | 2024-05-28 |
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection | ✓ Link | 37.0 | | CoDet (EVA02-L) | 2023-10-25 |
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | ✓ Link | 34.9 | | CLIPSelf | 2023-10-02 |
OVMR: Open-Vocabulary Recognition with Multi-Modal References | ✓ Link | 34.4 | | OVMR | 2024-06-07 |
Detect Everything with Few Examples | ✓ Link | 34.3 | | DE-ViT | 2023-09-22 |
Contrastive Feature Masking Open-Vocabulary Vision Transformer | | 33.9 | | CFM-ViT | 2023-09-02 |
CLIM: Contrastive Language-Image Mosaic for Region Representation | ✓ Link | 32.3 | | CLIM (RN50x64) | 2023-12-18 |
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | ✓ Link | 32.1 | | RO-ViT | 2023-05-11 |
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection | ✓ Link | 31.5 | | Prova (Swin-Base) | 2024-12-23 |
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | ✓ Link | 30.2 | | RTGen | 2024-05-30 |
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision | ✓ Link | 29.7 | | OV-DQUO(ViT-B/16) | 2024-05-28 |
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation | ✓ Link | 26.3 | 27.0 | ViLD-ensemble w/ ALIGN (Eb7-FPN) | 2021-04-28 |
Simple Open-Vocabulary Object Detection with Vision Transformers | ✓ Link | 25.6 | 31.2 | OWL-ViT (CLIP-L/14) | 2022-05-12 |
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition | ✓ Link | 25.2 | | POMP | 2023-04-10 |
Aligning Bag of Regions for Open-Vocabulary Object Detection | ✓ Link | 22.6 | | BARON | 2023-02-27 |
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization | ✓ Link | 22.4 | | MEDet | 2022-06-22 |
RegionCLIP: Region-based Language-Image Pretraining | ✓ Link | 22.0 | | Region-CLIP (RN50x4-C4) | 2021-12-16 |
Retrieval-Augmented Open-Vocabulary Object Detection | ✓ Link | 21.9 | | RALF | 2024-04-08 |
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | ✓ Link | 21.7 | | OADP | 2023-03-10 |
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion | ✓ Link | 21.4 | 22.8 | X-Paste | 2022-12-07 |
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection | ✓ Link | 21.1 | | Object-Centric-OVD | 2022-07-07 |
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation | ✓ Link | 18.7 | 19.8 | ViLD-ensemble (R152-FPN) | 2021-04-28 |
Detecting Twenty-thousand Classes using Image-level Supervision | ✓ Link | 17.8 | | Detic | 2022-01-07 |
RegionCLIP: Region-based Language-Image Pretraining | ✓ Link | 17.1 | | Region-CLIP (RN50-C4) | 2021-12-16 |
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation | ✓ Link | 16.6 | 16.7 | ViLD-ensemble (R50-FPN) | 2021-04-28 |
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation | ✓ Link | 16.1 | 16.3 | ViLD (R50-FPN) | 2021-04-28 |