document-image-classification-on-rvl-cdip

Image ClassificationDocument Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	Parameters	ModelName	ReleaseDate
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification		97.70%		EAML	2023-05-11
Visual and Textual Deep Feature Fusion for Document Image Classification		97.05%	197M	Cross-Modal	2020-06-16
DocFormer: End-to-End Transformer for Document Understanding	✓ Link	96.17%	183M	DocFormerBASE	2021-06-22
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking	✓ Link	95.93%	368M	LayoutLMV3Large	2022-04-18
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding	✓ Link	95.68%		LiLT[EN-R]BASE	2022-02-28
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	✓ Link	95.64%		LayoutLMv2LARGE	2020-12-29
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer	✓ Link	95.52%		TILT-Large	2021-02-18
DocFormer: End-to-End Transformer for Document Understanding	✓ Link	95.50%	536M	DocFormer large	2021-06-22
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking	✓ Link	95.44%	133M	LayoutLMv3BASE	2022-04-18
OCR-free Document Understanding Transformer	✓ Link	95.3%		Donut	2021-11-30
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer	✓ Link	95.25%		TILT-Base	2021-02-18
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	✓ Link	95.25%	200M	LayoutLMv2BASE	2020-12-29
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding	✓ Link	95.21%		LayoutXLM	2021-04-18
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training	✓ Link	94.62%	238M	StrucTexTv2 (large)	2023-03-01
LayoutLM: Pre-training of Text and Layout for Document Image Understanding	✓ Link	94.42%	160M	Pre-trained LayoutLM	2019-12-31
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment		94.12%	85M	DoPTA	2024-12-17
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification	✓ Link	94.00%	95.4M	DocXClassifier-B	2022-03-17
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training	✓ Link	93.4%	28M	StrucTexTv2 (small)	2023-03-01
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification		93.19%	217M	VLCDoC	2022-05-24
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification		93.18%	221M	TransferDoc	2023-09-11
Multimodal Side-Tuning for Document Classification	✓ Link	92.7%	57M	Multimodal (ResNet50)	2023-01-16
DiT: Self-supervised Pre-training for Document Image Transformer	✓ Link	92.69%	304M	DiT-L	2022-03-04
Improving accuracy and speeding up Document Image Classification through parallel systems	✓ Link	92.31%		Pre-trained EfficientNet	2020-06-16
Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks	✓ Link	92.21%		Transfer Learning from VGG16 trained on Imagenet	2018-01-29
Multimodal Side-Tuning for Document Classification	✓ Link	92.2%	12M	Multimodal (MobileNetV2)	2023-01-16
DiT: Self-supervised Pre-training for Document Image Transformer	✓ Link	92.11%	87M	DiT-B	2022-03-04
BEiT: BERT Pre-Training of Image Transformers	✓ Link	91.09%	87M	BEiT-B	2021-06-15
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification	✓ Link	90.97%		Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50	2017-04-11
Analysis of Convolutional Neural Networks for Document Image Classification		90.94%		AlexNet + spatial pyramidal pooling + image resizing	2017-08-10
Training data-efficient image transformers & distillation through attention	✓ Link	90.32%	87M	DeiT-B	2020-12-23
RoBERTa: A Robustly Optimized BERT Pretraining Approach	✓ Link	90.06	125M	Roberta base	2019-07-26

OpenCodePapers

document-image-classification-on-rvl-cdip