HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | ✓ Link | 0.026 | 0.128 | | 0.988 | 1.000 | 1.000 | HybridDepth | 2024-07-26 |
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator | ✓ Link | 0.043 | | | 0.981 | | | Distill Any Depth | 2025-02-26 |
UniK3D: Universal Camera Monocular 3D Estimation | ✓ Link | 0.044 | 0.173 | 0.019 | 0.989 | 0.998 | 1.000 | UniK3D (FT, metric) | 2025-03-20 |
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler | ✓ Link | 0.046 | 0.180 | 0.020 | 0.988 | 0.998 | 1.000 | UniDepthV2 (FT, metric) | 2025-02-27 |
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | ✓ Link | 0.046 | | | 0.977 | | | PrimeDepth + Depth Anything | 2024-09-13 |
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation | ✓ Link | 0.047 | 0.183 | 0.020 | 0.989 | 0.998 | 1.000 | Metric3Dv2(L, FT) | 2024-03-22 |
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation | ✓ Link | 0.050 | | | 0.972 | | | DepthMaster | 2025-01-05 |
GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion | | 0.051 | 0.251 | | | | | GRIN | 2024-09-15 |
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think | ✓ Link | 0.052 | | | 0.966 | | | Marigold + E2E FT(zero-shot) | 2024-09-17 |
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | ✓ Link | 0.055 | 0.224 | 0.024 | 0.964 | 0.991 | 0.998 | Marigold | 2023-12-04 |
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | ✓ Link | 0.056 | 0.206 | 0.024 | 0.984 | 0.998 | 1.000 | Depth Anything | 2024-01-19 |
UniDepth: Universal Monocular Metric Depth Estimation | ✓ Link | 0.058 | 0.201 | 0.024 | 0.984 | 0.997 | 0.999 | UniDepth (Zero-shot) | 2024-03-27 |
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage | ✓ Link | 0.058 | | | 0.966 | | | PrimeDepth | 2024-09-13 |
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | ✓ Link | 0.059 | 0.218 | 0.026 | 0.978 | 0.997 | 0.999 | ECoDepth | 2024-03-27 |
Harnessing Diffusion Models for Visual Perception with Meta Prompts | ✓ Link | 0.061 | 0.223 | 0.027 | 0.976 | 0.997 | 0.999 | MetaPrompt-SD | 2023-12-22 |
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment | ✓ Link | 0.061 | 0.224 | 0.027 | 0.976 | 0.997 | 0.999 | EVP | 2023-12-13 |
Text-image Alignment for Diffusion-based Perception | ✓ Link | 0.062 | 0.225 | 0.027 | 0.976 | 0.997 | 0.999 | TADP | 2023-09-29 |
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation | | 0.063 | 0.233 | 0.027 | 0.981 | 0.996 | 0.999 | FutureDepth | 2024-03-19 |
MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation | | 0.066 | 0.238 | 0.029 | 0.964 | 0.995 | 0.999 | MeSa | 2023-10-06 |
PolyMaX: General Dense Prediction with Mask Transformer | ✓ Link | 0.067 | 0.25 | 0.029 | 0.969 | 0.9958 | 0.999 | PolyMaX(ConvNeXt-L) | 2023-11-09 |
Unleashing Text-to-Image Diffusion Models for Visual Perception | ✓ Link | 0.069 | 0.254 | 0.030 | 0.964 | 0.995 | 0.999 | VPD | 2023-03-03 |
NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation | ✓ Link | 0.072 | 0.282 | 0.031 | 0.9493 | 0.991 | 0.997 | NVDS(DPT-L) | 2023-07-17 |
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model | | 0.072 | 0.296 | 0.031 | 0.953 | 0.989 | 0.996 | DMD | 2023-12-20 |
ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation | ✓ Link | 0.074 | 0.267 | 0.032 | 0.957 | 0.994 | 0.999 | ScaleDepth-N | 2024-07-11 |
Monocular Depth Estimation using Diffusion Models | | 0.074 | 0.314 | 0.032 | 0.946 | 0.987 | 0.996 | DepthGen | 2023-02-28 |
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth | ✓ Link | 0.075 | 0.270 | 0.032 | 0.955 | 0.995 | 0.999 | ZoeD-M12-N | 2023-02-23 |
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token | ✓ Link | 0.076 | 0.275 | 0.033 | 0.954 | 0.994 | 0.999 | AiT-P(SwinV2-L) | 2023-01-05 |
Large-scale Monocular Depth Estimation in the Wild | | 0.080 | 0.364 | 0.033 | 0.931 | 0.986 | 0.996 | Gaming for Depth (GfD) | 2023-09-18 |
Revealing the Dark Secrets of Masked Image Modeling | ✓ Link | 0.083 | 0.287 | 0.035 | 0.949 | 0.994 | 0.999 | SwinV2-L 1K-MIM | 2022-05-26 |
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image | ✓ Link | 0.083 | 0.310 | 0.035 | 0.944 | 0.986 | 0.995 | Metric3D (ConvNeXt-Large, Zero-shot testing) | 2023-07-20 |
VA-DepthNet: A Variational Approach to Single Image Depth Prediction | ✓ Link | 0.086 | 0.304 | 0.037 | 0.937 | 0.992 | 0.999 | VA-DepthNet(SwinV1-L) | 2023-02-13 |
iDisc: Internal Discretization for Monocular Depth Estimation | ✓ Link | 0.086 | | | | 0.993 | 0.999 | iDisc | 2023-04-13 |
Analysis of NaN Divergence in Training Monocular Depth Estimation Model | | 0.0864 | 0.3046 | 0.0365 | 0.9361 | 0.9916 | 0.9981 | MIM-Swin-V2 | 2023-11-07 |
NDDepth: Normal-Distance Assisted Monocular Depth Estimation | ✓ Link | 0.087 | 0.311 | 0.038 | 0.936 | 0.991 | 0.998 | NDDepth | 2023-09-19 |
IEBins: Iterative Elastic Bins for Monocular Depth Estimation | ✓ Link | 0.087 | 0.314 | 0.038 | 0.936 | 0.992 | 0.998 | IEBins | 2023-09-25 |
URCDC-Depth: Uncertainty Rectified Cross-Distillation with CutFlip for Monocular Depth Estimation | ✓ Link | 0.088 | 0.316 | 0.038 | 0.933 | 0.992 | 0.998 | URCDC-Depth | 2023-02-16 |
Improving Deep Regression with Ordinal Entropy | ✓ Link | 0.089 | 0.321 | 0.039 | 0.932 | | | OrdinalEntropy | 2023-01-21 |
Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention | ✓ Link | 0.090 | 0.322 | 0.039 | 0.929 | 0.991 | 0.998 | PixelFormer | 2022-10-17 |
Learning to Recover 3D Scene Shape from a Single Image | ✓ Link | 0.09 | | | 0.916 | | | LeReS | 2020-12-17 |
DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 0.0907 | 0.279 | 0.0371 | 0.9497 | 0.996 | 0.9994 | DINOv2 (ViT-g/14 frozen, w/ DPT decoder) | 2023-04-14 |
DDP: Diffusion Model for Dense Visual Prediction | ✓ Link | 0.094 | 0.329 | 0.040 | 0.921 | 0.990 | 0.998 | DDP (step3) | 2023-03-30 |
BinsFormer: Revisiting Adaptive Bins for Monocular Depth Estimation | ✓ Link | 0.094 | 0.330 | 0.040 | 0.925 | 0.989 | 0.997 | BinsFormer | 2022-04-03 |
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation | ✓ Link | 0.095 | 0.334 | 0.041 | 0.922 | 0.992 | 0.998 | NeWCRFs | 2022-03-03 |
D-Net: A Generalised and Optimised Deep Network for Monocular Depth Estimation | ✓ Link | 0.095 | 0.354 | 0.041 | 0.919 | 0.988 | 0.997 | D-Net | 2021-09-29 |
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation | ✓ Link | 0.096 | 0.339 | 0.041 | 0.921 | 0.989 | 0.998 | DepthFormer | 2022-03-27 |
Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth | ✓ Link | 0.098 | 0.344 | 0.042 | 0.915 | 0.988 | 0.997 | GLPDepth | 2022-01-19 |
LocalBins: Improving Depth Estimation by Learning Local Distributions | ✓ Link | 0.098 | 0.351 | 0.042 | 0.91 | 0.986 | 0.997 | LocalBins | 2022-03-28 |
Depth Map Decomposition for Monocular Depth Estimation | ✓ Link | 0.098 | 0.355 | 0.042 | 0.913 | 0.987 | 0.998 | Depth-Map-Decomposition-HRWSI | 2022-08-23 |
Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion | ✓ Link | 0.100 | 0.345 | 0.042 | 0.913 | 0.988 | 0.997 | Depthformer | 2022-07-10 |
Depth Map Decomposition for Monocular Depth Estimation | ✓ Link | 0.100 | 0.362 | 0.043 | 0.907 | 0.986 | 0.997 | Depth-Map-Decomposition | 2022-08-23 |
IronDepth: Iterative Refinement of Single-View Depth using Surface Normal and its Uncertainty | ✓ Link | 0.101 | 0.352 | 0.043 | 0.910 | 0.985 | 0.997 | IronDepth | 2022-10-07 |
AdaBins: Depth Estimation using Adaptive Bins | ✓ Link | 0.103 | 0.364 | 0.044 | 0.903 | 0.984 | 0.997 | AdaBins | 2020-11-28 |
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior | ✓ Link | 0.104 | 0.356 | 0.043 | 0.898 | 0.981 | 0.996 | P3Depth | 2022-04-05 |
CutDepth:Edge-aware Data Augmentation in Depth Estimation | ✓ Link | 0.104 | 0.375 | 0.044 | 0.899 | 0.985 | 0.997 | CutDepth | 2021-07-16 |
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals | ✓ Link | 0.105 | 0.384 | 0.045 | 0.895 | 0.983 | 0.996 | LapDepth | 2021-01-08 |
Vision Transformers for Dense Prediction | ✓ Link | 0.110 | 0.357 | 0.045 | 0.904 | 0.988 | 0.994 | DPT-Hybrid | 2021-03-24 |
Enforcing geometric constraints of virtual normal for depth prediction | ✓ Link | 0.111 | 0.416 | 0.048 | 0.875 | 0.976 | 0.989 | VNL | 2019-07-29 |
Focal-WNet: An Architecture Unifying Convolution and Attention for Depth Estimation | ✓ Link | 0.116 | 0.398 | 0.048 | 0.875 | 0.980 | 0.995 | Focal-WNet | 2022-07-18 |
Auto-Rectify Network for Unsupervised Indoor Depth Estimation | ✓ Link | 0.138 | 0.532 | 0.059 | 0.820 | 0.956 | | SC-DepthV2 | 2020-06-04 |
NVS-MonoDepth: Improving Monocular Depth Prediction with Novel View Synthesis | | | 0.331 | | | | | NVS-MonoDepth | 2021-12-22 |
From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation | ✓ Link | | 0.392 | | | | 0.995 | BTS | 2019-07-24 |
On Deep Learning Techniques to Boost Monocular Depth Estimation for Autonomous Navigation | | | 0.429 | | | | | DSN | 2020-10-13 |
High Quality Monocular Depth Estimation via Transfer Learning | ✓ Link | | 0.465 | | | | | DenseDepth | 2018-12-31 |
Attention-based Context Aggregation Network for Monocular Depth Estimation | ✓ Link | | 0.496 | | | | | ACAN | 2019-01-29 |
SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation | ✓ Link | | 0.496 | | | | | SharpNet | 2019-05-21 |
Pattern-Affinitive Propagation across Depth, Surface Normal and Semantic Segmentation | | | 0.497 | | | | | PAP-Depth | 2019-06-08 |
SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation | | | 0.497 | | | | | SDC-Depth | 2020-06-01 |
Deep Ordinal Regression Network for Monocular Depth Estimation | ✓ Link | | 0.509 | | | | | DORN | 2018-06-06 |
Structure-Aware Residual Pyramid Network for Monocular Depth Estimation | ✓ Link | | 0.514 | | | | | SARPN | 2019-07-13 |
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding | ✓ Link | | 0.5183 | | | | | InvPT | 2022-03-15 |
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells | ✓ Link | | 0.523 | | | | | FastDenseNas-arch0 | 2018-10-25 |
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells | ✓ Link | | 0.525 | | | | | FastDenseNas-arch2 | 2018-10-25 |
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells | ✓ Link | | 0.526 | | | | | FastDenseNas-arch1 | 2018-10-25 |
Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries | ✓ Link | | 0.530 | | | | | SENet-154 | 2018-03-23 |
Generating and Exploiting Probabilistic Monocular Depth Estimates | ✓ Link | | 0.536 | | | | | ProbMonoDepth | 2019-06-13 |
Monocular Depth Estimation Using Relative Depth Maps | | | 0.538 | | | | | RelativeDepth | 2019-06-01 |
Prompt Guided Transformer for Multi-Task Dense Prediction | ✓ Link | | 0.5468 | | | | | PGT (Swin-S) | 2023-07-28 |
Index Network | ✓ Link | | 0.565 | | | | | Index Network | 2019-08-11 |
Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations | ✓ Link | | 0.565 | | | | | Multi-Task Light-Weight-RefineNet | 2018-09-13 |
Single Image Depth Estimation Trained via Depth from Defocus Cues | ✓ Link | | 0.575 | | | | | DeepLabV3+ (F10) | 2020-01-14 |
Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation | ✓ Link | | 0.586 | | | | | Xu et al. | 2017-04-07 |
Prompt Guided Transformer for Multi-Task Dense Prediction | ✓ Link | | 0.59 | | | | | PGT (Swin-T) | 2023-07-28 |
Structure-Attentioned Memory Network for Monocular Depth Estimation | | | 0.604 | | | | | SOM | 2019-09-10 |
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images | | | 0.635 | | | | | Li et al. | 2016-07-04 |
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture | ✓ Link | | 0.641 | | | | | Eigen et al. | 2014-11-18 |