text-to-image-generation-on-geneval

Text-to-Image Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Overall	Single Obj.	Two Obj.	Color Attri.	Colors	Counting	Position	ModelName	ReleaseDate
Flow-GRPO: Training Flow Matching Models via Online RL	✓ Link	0.95							SD3.5-Medium+Flow-GRPO	2025-05-08
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	✓ Link	0.84	0.98	0.93	0.71	0.90	0.81	0.74	UniWorld-V1 (Rewrite)	2025-06-03
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO	✓ Link	0.83	0.99	0.94	0.71	0.90	0.71	0.71	MindOmni	2025-05-19
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	✓ Link	0.80							Janus-Pro-7B	2025-01-29
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer	✓ Link	0.80							SANA-1.5 4.8B (+ Inference Scaling)	2025-01-30
Transfer between Modalities with MetaQueries		0.80							MetaQuery-XL (Rewrite)	2025-04-08
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation	✓ Link	0.80	0.99	0.93	0.70	0.89	0.79	0.49	UniWorld-V1	2025-06-03
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step	✓ Link	0.77							Show-o [xie2024show] PARM It. DPO PARM	2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step	✓ Link	0.75							Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM	2025-01-23
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling	✓ Link	0.73							Janus-Pro-1B	2025-01-29
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework	✓ Link	0.73							Lumina-Image 2.0	2025-03-27
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer	✓ Link	0.72	0.99	0.85					SANA-1.5 4.8B	2025-01-30
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens	✓ Link	0.69	0.96	0.83	0.51	0.80	0.63	0.39	Fluid (10.5B)	2024-10-17
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation	✓ Link	0.68							Und. and Gen. Show-o (Ours)	2024-08-22
Emu3: Next-Token Prediction is All You Need	✓ Link	0.66							Emu3	2024-09-27
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training		0.66							SnapGen	2024-12-12
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	✓ Link	0.63							JanusFlow	2024-11-12
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation	✓ Link	0.53							PixArt-Σ	2024-03-07
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers		0.51							DiffMoE-E16-T2I-Flow (w SFT)	2025-03-18
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	✓ Link	0							PIXART-δ	2024-01-10

OpenCodePapers

text-to-image-generation-on-geneval