OpenCodePapers

text-to-image-generation-on-geneval

Text-to-Image Generation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeOverallSingle Obj.Two Obj.Color Attri.ColorsCountingPositionModelNameReleaseDate
Flow-GRPO: Training Flow Matching Models via Online RL✓ Link0.95SD3.5-Medium+Flow-GRPO2025-05-08
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation✓ Link0.840.980.930.710.900.810.74UniWorld-V1 (Rewrite)2025-06-03
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO✓ Link0.830.990.940.710.900.710.71MindOmni2025-05-19
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation✓ Link0.800.990.930.700.890.790.49UniWorld-V12025-06-03
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer✓ Link0.80SANA-1.5 4.8B (+ Inference Scaling)2025-01-30
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling✓ Link0.80Janus-Pro-7B2025-01-29
Transfer between Modalities with MetaQueries0.80MetaQuery-XL (Rewrite)2025-04-08
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step✓ Link0.77Show-o [xie2024show] PARM It. DPO PARM2025-01-23
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step✓ Link0.75Show-o [xie2024show] Ft. ORM It. DPO Ft. ORM2025-01-23
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling✓ Link0.73Janus-Pro-1B2025-01-29
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework✓ Link0.73Lumina-Image 2.02025-03-27
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer✓ Link0.720.990.85SANA-1.5 4.8B2025-01-30
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens✓ Link0.690.960.830.510.800.630.39Fluid (10.5B)2024-10-17
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation✓ Link0.68Und. and Gen. Show-o (Ours)2024-08-22
Emu3: Next-Token Prediction is All You Need✓ Link0.66Emu32024-09-27
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training0.66SnapGen2024-12-12
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation✓ Link0.63JanusFlow2024-11-12
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation✓ Link0.53PixArt-Σ2024-03-07
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers0.51DiffMoE-E16-T2I-Flow (w SFT)2025-03-18
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models✓ Link0PIXART-δ2024-01-10