Vision Grid Transformer for Document Layout Analysis | ✓ Link | 0.962 | 0.950 | 0.939 | 0.968 | 0.981 | 0.971 | VGT | 2023-08-29 |
Transformer-based Approach for Document Understanding | | 0.959 | 0.958 | 0.921 | 0.975 | 0.976 | 0.966 | TRDLU | 2022-10-16 |
VSR: A Unified Framework for Document Layout Analysis combining Vision, Semantics and Relations | ✓ Link | 0.957 | 0.967 | 0.931 | 0.947 | 0.974 | 0.964 | VSR | 2021-05-13 |
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images | | 0.957 | 0.947 | 0.918 | 0.964 | 0.981 | 0.975 | DETR | 2023-06-23 |
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | ✓ Link | 0.951 | 0.945 | 0.906 | 0.955 | 0.979 | 0.970 | LayoutLMv3-B | 2022-04-18 |
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | | 0.949 | 0.944 | 0.895 | 0.957 | 0.977 | 0.970 | DoPTA | 2024-12-17 |
DiT: Self-supervised Pre-training for Document Image Transformer | ✓ Link | 0.949 | 0.944 | 0.893 | 0.960 | 0.978 | 0.972 | DiT-L | 2022-03-04 |
Unified Pretraining Framework for Document Understanding | | 0.939 | 0.939 | 0.885 | 0.937 | 0.973 | 0.964 | UDoc | 2022-04-22 |
Vision Grid Transformer for Document Layout Analysis | ✓ Link | 0.935 | 0.930 | 0.862 | 0.940 | 0.976 | 0.968 | ResNext-101-32×8d | 2023-08-29 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 0.932 | 0.934 | 0.874 | 0.921 | 0.972 | 0.957 | DeiT-B | 2020-12-23 |
BEiT: BERT Pre-Training of Image Transformers | ✓ Link | 0.931 | 0.934 | 0.866 | 0.924 | 0.973 | 0.957 | BEiT-B | 2021-06-15 |
PubLayNet: largest dataset ever for document layout analysis | ✓ Link | 0.910 | 0.916 | 0.840 | 0.886 | 0.960 | 0.949 | Mask RCNN | 2019-08-16 |
PubLayNet: largest dataset ever for document layout analysis | ✓ Link | 0.902 | 0.910 | 0.826 | 0.883 | 0.954 | 0.937 | Faster RCNN | 2019-08-16 |
A Graphical Approach to Document Layout Analysis | ✓ Link | 0.722 | 0.878 | 0.800 | 0.862 | 0.868 | 0.206 | GLAM | 2023-08-03 |
CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images | ✓ Link | | | | | 0.978 | | CDeC-Net | 2020-08-25 |