open-vocabulary-object-detection-on-lvis-v1-0

Object DetectionOpen Vocabulary Object Detection

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	AP novel-LVIS base training	AP novel-Unrestricted open-vocabulary training	ModelName	ReleaseDate
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction	✓ Link	43.4		LaMI-DETR	2024-07-16
Region-centric Image-Language Pretraining for Open-Vocabulary Detection	✓ Link	40.4	45.8	DITO	2023-09-29
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision	✓ Link	39.3		OV-DQUO(ViT-L/14)	2024-05-28
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection	✓ Link	37.0		CoDet (EVA02-L)	2023-10-25
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	✓ Link	34.9		CLIPSelf	2023-10-02
OVMR: Open-Vocabulary Recognition with Multi-Modal References	✓ Link	34.4		OVMR	2024-06-07
Detect Everything with Few Examples	✓ Link	34.3		DE-ViT	2023-09-22
Contrastive Feature Masking Open-Vocabulary Vision Transformer		33.9		CFM-ViT	2023-09-02
CLIM: Contrastive Language-Image Mosaic for Region Representation	✓ Link	32.3		CLIM (RN50x64)	2023-12-18
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers	✓ Link	32.1		RO-ViT	2023-05-11
Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	✓ Link	31.5		Prova (Swin-Base)	2024-12-23
RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection	✓ Link	30.2		RTGen	2024-05-30
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision	✓ Link	29.7		OV-DQUO(ViT-B/16)	2024-05-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation	✓ Link	26.3	27.0	ViLD-ensemble w/ ALIGN (Eb7-FPN)	2021-04-28
Simple Open-Vocabulary Object Detection with Vision Transformers	✓ Link	25.6	31.2	OWL-ViT (CLIP-L/14)	2022-05-12
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition	✓ Link	25.2		POMP	2023-04-10
Aligning Bag of Regions for Open-Vocabulary Object Detection	✓ Link	22.6		BARON	2023-02-27
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization	✓ Link	22.4		MEDet	2022-06-22
RegionCLIP: Region-based Language-Image Pretraining	✓ Link	22.0		Region-CLIP (RN50x4-C4)	2021-12-16
Retrieval-Augmented Open-Vocabulary Object Detection	✓ Link	21.9		RALF	2024-04-08
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection	✓ Link	21.7		OADP	2023-03-10
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion	✓ Link	21.4	22.8	X-Paste	2022-12-07
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection	✓ Link	21.1		Object-Centric-OVD	2022-07-07
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation	✓ Link	18.7	19.8	ViLD-ensemble (R152-FPN)	2021-04-28
Detecting Twenty-thousand Classes using Image-level Supervision	✓ Link	17.8		Detic	2022-01-07
RegionCLIP: Region-based Language-Image Pretraining	✓ Link	17.1		Region-CLIP (RN50-C4)	2021-12-16
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation	✓ Link	16.6	16.7	ViLD-ensemble (R50-FPN)	2021-04-28
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation	✓ Link	16.1	16.3	ViLD (R50-FPN)	2021-04-28

OpenCodePapers

open-vocabulary-object-detection-on-lvis-v1-0