MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | ✓ Link | 0.71 | 0.75 | 0.70 | 0.76 | 0.76 | 0.72 | 0.52 | MindOmni
(w/ cot) | 2025-05-19 |
Emerging Properties in Unified Multimodal Pretraining | ✓ Link | 0.70 | 0.76 | 0.69 | 0.75 | 0.65 | 0.75 | 0.58 | Bagel (w/ cot) | 2025-05-20 |
[]() | | 0.55 | 0.56 | 0.55 | 0.62 | 0.49 | 0.63 | 0.41 | MetaQuery-XL | |
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation | ✓ Link | 0.55 | 0.53 | 0.55 | 0.73 | 0.45 | 0.59 | 0.41 | UniWorld-V1 | 2025-06-03 |
Emerging Properties in Unified Multimodal Pretraining | ✓ Link | 0.52 | 0.44 | 0.55 | 0.68 | 0.44 | 0.60 | 0.39 | Bagel | 2025-05-20 |
Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation | | 0.49 | 0.49 | 0.58 | 0.55 | 0.43 | 0.48 | 0.33 | Playground-v2.5-1024px-aesthetic | 2024-02-27 |
PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | ✓ Link | 0.47 | 0.45 | 0.50 | 0.48 | 0.49 | 0.56 | 0.34 | PixArt-XL-2-1024-MS | 2023-09-30 |
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | ✓ Link | 0.46 | 0.44 | 0.50 | 0.58 | 0.44 | 0.52 | 0.31 | stable-diffusion-3.5-large | 2024-03-05 |
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | ✓ Link | 0.43 | 0.43 | 0.48 | 0.47 | 0.44 | 0.45 | 0.27 | stable-diffusion-xl-base-0.9 | 2023-07-04 |
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | ✓ Link | 0.43 | 0.40 | 0.38 | 0.62 | 0.36 | 0.52 | 0.32 | MindOmni
(w/o cot) | 2025-05-19 |
Emu3: Next-Token Prediction is All You Need | ✓ Link | 0.39 | 0.34 | 0.45 | 0.48 | 0.41 | 0.45 | 0.27 | Emu3-gen | 2024-09-27 |
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling | ✓ Link | 0.35 | 0.30 | 0.37 | 0.49 | 0.36 | 0.42 | 0.26 | Janus-pro | 2025-01-29 |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | ✓ Link | 0.35 | 0.28 | 0.40 | 0.48 | 0.30 | 0.46 | 0.30 | Show-o | 2024-08-22 |
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling | ✓ Link | 0.23 | 0.16 | 0.26 | 0.35 | 0.28 | 0.30 | 0.14 | Janus | 2025-01-29 |