Paper | Code | TextVsionBlend OCR (F1 Score) | TextVisionBlend OCR (Accuracy) | TextVisionBlend OCR (Cer) | TextVisionBlend FID | TextVisionBlend Clip Score | StyledTextSynth OCR (F1 Score) | StyledTextSynth OCR (Accuracy) | StyledTextSynth OCR (Cer) | StyledTextSynth FID | StyledTextSynth Clip Score | TextScenesHQ OCR (F1 Score) | TextScenesHQ OCR (Accuracy) | TextScenesHQ OCR (Cer) | TextScenesHQ FID | TextScenesHQ Clip Score | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[]() | 44.22 | 41.54 | 0.57 | - | 0.1697 | 21.40 | 15.82 | 0.73 | 80.33 | 0.2938 | 37.94 | 35.07 | 0.57 | - | 0.3197 | Grok3 | ||
[]() | 16.25 | 14.55 | 0.88 | 118.85 | 0.1846 | 33.86 | 27.21 | 0.73 | 71.09 | 0.2849 | 24.45 | 19.03 | 0.73 | 64.44 | 0.2363 | SD3.5 Large | ||
[]() | 7.94 | 8.38 | 0.93 | 153.21 | 0.1938 | 38.25 | 30.58 | 0.78 | 90.70 | 0.2938 | 51.63 | 69.26 | - | 86.73 | 0.3367 | Dalle3 | ||
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | ✓ Link | 3.44 | 2.98 | 0.83 | 95.69 | 0.1979 | 1.42 | 0.80 | 0.93 | 84.95 | 0.2727 | 1.74 | 1.06 | 0.88 | 71.59 | 0.2346 | Infinity-2B | 2024-10-24 |
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | ✓ Link | 1.57 | 2.40 | 0.83 | 81.29 | 0.1891 | 0.62 | 0.42 | 0.90 | 82.83 | 0.2764 | 0.53 | 0.34 | 0.91 | 72.62 | 0.2347 | PixArt-Sigma | 2024-03-07 |
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering | - | - | - | - | - | 1.46 | 0.76 | 0.99 | 114.31 | 0.2510 | 1.25 | 0.66 | 0.96 | 84.10 | 0.2252 | TextDiffuser2 | 2023-11-28 | |
AnyText: Multilingual Visual Text Generation And Editing | ✓ Link | - | - | - | - | - | 0.66 | 0.35 | 0.98 | 117.71 | 0.2501 | 0.8 | 0.42 | 0.95 | 101.32 | 0.2174 | Anytext | 2023-11-06 |