OpenCodePapers

open-vocabulary-object-detection-on-lvis-v1-0

Object DetectionOpen Vocabulary Object Detection
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAP novel-LVIS base trainingAP novel-Unrestricted open-vocabulary trainingModelNameReleaseDate
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction✓ Link43.4LaMI-DETR2024-07-16
Region-centric Image-Language Pretraining for Open-Vocabulary Detection✓ Link40.445.8DITO2023-09-29
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision✓ Link39.3OV-DQUO(ViT-L/14)2024-05-28
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection✓ Link37.0CoDet (EVA02-L)2023-10-25
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction✓ Link34.9CLIPSelf2023-10-02
OVMR: Open-Vocabulary Recognition with Multi-Modal References✓ Link34.4OVMR2024-06-07
Detect Everything with Few Examples✓ Link34.3DE-ViT2023-09-22
Contrastive Feature Masking Open-Vocabulary Vision Transformer33.9CFM-ViT2023-09-02
CLIM: Contrastive Language-Image Mosaic for Region Representation✓ Link32.3CLIM (RN50x64)2023-12-18
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers✓ Link32.1RO-ViT2023-05-11
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection✓ Link31.5Prova (Swin-Base)2024-12-23
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection✓ Link30.2RTGen2024-05-30
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision✓ Link29.7OV-DQUO(ViT-B/16)2024-05-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation✓ Link26.327.0ViLD-ensemble w/ ALIGN (Eb7-FPN)2021-04-28
Simple Open-Vocabulary Object Detection with Vision Transformers✓ Link25.631.2OWL-ViT (CLIP-L/14)2022-05-12
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition✓ Link25.2POMP2023-04-10
Aligning Bag of Regions for Open-Vocabulary Object Detection✓ Link22.6BARON2023-02-27
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization✓ Link22.4MEDet2022-06-22
RegionCLIP: Region-based Language-Image Pretraining✓ Link22.0Region-CLIP (RN50x4-C4)2021-12-16
Retrieval-Augmented Open-Vocabulary Object Detection✓ Link21.9RALF2024-04-08
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection✓ Link21.7OADP2023-03-10
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion✓ Link21.422.8X-Paste2022-12-07
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection✓ Link21.1Object-Centric-OVD2022-07-07
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation✓ Link18.719.8ViLD-ensemble (R152-FPN)2021-04-28
Detecting Twenty-thousand Classes using Image-level Supervision✓ Link17.8Detic2022-01-07
RegionCLIP: Region-based Language-Image Pretraining✓ Link17.1Region-CLIP (RN50-C4)2021-12-16
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation✓ Link16.616.7ViLD-ensemble (R50-FPN)2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation✓ Link16.116.3ViLD (R50-FPN)2021-04-28