EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification | | 97.70% | | EAML | 2023-05-11 |
Visual and Textual Deep Feature Fusion for Document Image Classification | | 97.05% | 197M | Cross-Modal | 2020-06-16 |
DocFormer: End-to-End Transformer for Document Understanding | ✓ Link | 96.17% | 183M | DocFormerBASE | 2021-06-22 |
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | ✓ Link | 95.93% | 368M | LayoutLMV3Large | 2022-04-18 |
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding | ✓ Link | 95.68% | | LiLT[EN-R]BASE | 2022-02-28 |
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | ✓ Link | 95.64% | | LayoutLMv2LARGE | 2020-12-29 |
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | ✓ Link | 95.52% | | TILT-Large | 2021-02-18 |
DocFormer: End-to-End Transformer for Document Understanding | ✓ Link | 95.50% | 536M | DocFormer large | 2021-06-22 |
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking | ✓ Link | 95.44% | 133M | LayoutLMv3BASE | 2022-04-18 |
OCR-free Document Understanding Transformer | ✓ Link | 95.3% | | Donut | 2021-11-30 |
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer | ✓ Link | 95.25% | | TILT-Base | 2021-02-18 |
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding | ✓ Link | 95.25% | 200M | LayoutLMv2BASE | 2020-12-29 |
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding | ✓ Link | 95.21% | | LayoutXLM | 2021-04-18 |
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training | ✓ Link | 94.62% | 238M | StrucTexTv2 (large) | 2023-03-01 |
LayoutLM: Pre-training of Text and Layout for Document Image Understanding | ✓ Link | 94.42% | 160M | Pre-trained LayoutLM | 2019-12-31 |
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment | | 94.12% | 85M | DoPTA | 2024-12-17 |
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification | ✓ Link | 94.00% | 95.4M | DocXClassifier-B | 2022-03-17 |
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training | ✓ Link | 93.4% | 28M | StrucTexTv2 (small) | 2023-03-01 |
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification | | 93.19% | 217M | VLCDoC | 2022-05-24 |
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification | | 93.18% | 221M | TransferDoc | 2023-09-11 |
Multimodal Side-Tuning for Document Classification | ✓ Link | 92.7% | 57M | Multimodal (ResNet50) | 2023-01-16 |
DiT: Self-supervised Pre-training for Document Image Transformer | ✓ Link | 92.69% | 304M | DiT-L | 2022-03-04 |
Improving accuracy and speeding up Document Image Classification through parallel systems | ✓ Link | 92.31% | | Pre-trained EfficientNet | 2020-06-16 |
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks | ✓ Link | 92.21% | | Transfer Learning from VGG16 trained on Imagenet | 2018-01-29 |
Multimodal Side-Tuning for Document Classification | ✓ Link | 92.2% | 12M | Multimodal (MobileNetV2) | 2023-01-16 |
DiT: Self-supervised Pre-training for Document Image Transformer | ✓ Link | 92.11% | 87M | DiT-B | 2022-03-04 |
BEiT: BERT Pre-Training of Image Transformers | ✓ Link | 91.09% | 87M | BEiT-B | 2021-06-15 |
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification | ✓ Link | 90.97% | | Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50 | 2017-04-11 |
Analysis of Convolutional Neural Networks for Document Image Classification | | 90.94% | | AlexNet + spatial pyramidal pooling + image resizing | 2017-08-10 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 90.32% | 87M | DeiT-B | 2020-12-23 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 90.06 | 125M | Roberta base | 2019-07-26 |