OpenCodePapers

document-image-classification-on-rvl-cdip

Image ClassificationDocument Image Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyParametersModelNameReleaseDate
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification97.70%EAML2023-05-11
Visual and Textual Deep Feature Fusion for Document Image Classification97.05%197MCross-Modal2020-06-16
DocFormer: End-to-End Transformer for Document Understanding✓ Link96.17%183MDocFormerBASE2021-06-22
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking✓ Link95.93%368MLayoutLMV3Large2022-04-18
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding✓ Link95.68%LiLT[EN-R]BASE2022-02-28
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding✓ Link95.64%LayoutLMv2LARGE2020-12-29
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer✓ Link95.52%TILT-Large2021-02-18
DocFormer: End-to-End Transformer for Document Understanding✓ Link95.50%536MDocFormer large2021-06-22
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking✓ Link95.44%133MLayoutLMv3BASE2022-04-18
OCR-free Document Understanding Transformer✓ Link95.3%Donut2021-11-30
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer✓ Link95.25%TILT-Base2021-02-18
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding✓ Link95.25%200MLayoutLMv2BASE2020-12-29
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding✓ Link95.21%LayoutXLM2021-04-18
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training✓ Link94.62%238MStrucTexTv2 (large)2023-03-01
LayoutLM: Pre-training of Text and Layout for Document Image Understanding✓ Link94.42%160MPre-trained LayoutLM2019-12-31
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment94.12%85MDoPTA2024-12-17
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification✓ Link94.00%95.4MDocXClassifier-B2022-03-17
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training✓ Link93.4%28MStrucTexTv2 (small)2023-03-01
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification93.19%217MVLCDoC2022-05-24
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification93.18%221MTransferDoc2023-09-11
Multimodal Side-Tuning for Document Classification✓ Link92.7%57MMultimodal (ResNet50)2023-01-16
DiT: Self-supervised Pre-training for Document Image Transformer✓ Link92.69%304MDiT-L2022-03-04
Improving accuracy and speeding up Document Image Classification through parallel systems✓ Link92.31%Pre-trained EfficientNet2020-06-16
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks✓ Link92.21%Transfer Learning from VGG16 trained on Imagenet2018-01-29
Multimodal Side-Tuning for Document Classification✓ Link92.2%12MMultimodal (MobileNetV2)2023-01-16
DiT: Self-supervised Pre-training for Document Image Transformer✓ Link92.11%87MDiT-B2022-03-04
BEiT: BERT Pre-Training of Image Transformers✓ Link91.09%87MBEiT-B2021-06-15
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification✓ Link90.97%Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet502017-04-11
Analysis of Convolutional Neural Networks for Document Image Classification90.94%AlexNet + spatial pyramidal pooling + image resizing2017-08-10
Training data-efficient image transformers & distillation through attention✓ Link90.32% 87MDeiT-B2020-12-23
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link90.06125MRoberta base2019-07-26