Boosting Monocular Depth Estimation to High Resolution

Seyed Mahdi Hosseini Miangoleh
MSc Thesis
Simon Fraser University, 2022
Boosting Monocular Depth Estimation to High Resolution

Depth maps generated for Fresco by Raphael in the Vatican City known as School of Athens by our method. This result was featured by Nature in their Best Science Images of the Month - August 2021 selection.

Abstract

Convolutional neural networks have shown a remarkable ability to estimate depth from a single image. However, the estimated depth maps are low resolution due to network structure and hardware limitations, only showing the overall scene structure and lacking fine details, which limits their applicability. We demonstrate that there is a trade-off between the consistency of the scene structure and the high-frequency details concerning input content and resolution. Building upon this duality, we present a double estimation framework to improve the depth estimation of the whole image and a patch selection step to add more local details. Our approach obtains multi-megapixel depth estimations with sharp details by merging estimations at different resolutions based on image content. A key strength of our approach is that we can employ any off-the-shelf pre-trained CNN-based monocular depth estimation model without requiring further finetuning.

Dissertation

Video Presentation

BibTeX

@MASTERSTHESIS{bmd-msc,
author={Seyed Mahdi Hosseini Miangoleh},
title={Boosting Monocular Depth Estimation to High Resolution},
year={2022},
school={Simon Fraser University},
}

Publications in the context of this thesis


S. Mahdi H. Miangoleh*, Sebastian Dille*, Long Mai, Sylvain Paris, and Yağız Aksoy
CVPR, 2021
Neural networks have shown great abilities in estimating depth from a single image. However, the inferred depth maps are well below one-megapixel resolution and often lack fine-grained details, which limits their practicality. Our method builds on our analysis on how the input resolution and the scene structure affects depth estimation performance. We demonstrate that there is a trade-off between a consistent scene structure and the high-frequency details, and merge low- and high-resolution estimations to take advantage of this duality using a simple depth merging network. We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details to the final result. We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail using a pre-trained model.
@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}