Scholar - SciOpen

Based on well-designed network architectures and objective functions, self-supervised monocular depth estimation has made great progress. However, lacking a specific mechanism to make the network learn more about the regions containing moving objects or occlusion scenarios, existing depth estimation methods likely produce poor results for them. Therefore, we propose an uncertainty quantification method to improve the performance of existing depth estimation networks without changing their architectures. Our uncertainty quantification method consists of uncertainty measurement, the learning guidance by uncertainty, and the ultimate adaptive determination. Firstly, with Snapshot and Siam learning strategies, we measure the uncertainty degree by calculating the variance of pre-converged epochs or twins during training. Secondly, we use the uncertainty to guide the network to strengthen learning about those regions with more uncertainty. Finally, we use the uncertainty to adaptively produce the final depth estimation results with a balance of accuracy and robustness. To demonstrate the effectiveness of our uncertainty quantification method, we apply it to two state-of-the-art models, Monodepth2 and Hints. Experimental results show that our method has improved the depth estimation performance in seven evaluation metrics compared with two baseline models and exceeded the existing uncertainty method.

Open Access Research Article Issue

Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module

Yuanzhen Li, Fei Luo, Chunxia Xiao

Computational Visual Media 2022, 8 (4): 631-647

Published: 16 June 2022

Abstract

PDF (5.2 MB)

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Downloads：51

Self-supervised monocular depth estimation has been widely investigated and applied in previous works. However, existing methods suffer from texture-copy, depth drift, and incomplete structure. It is difficult for normal CNN networks to completely understand the relationship between the object and its surrounding environment. Moreover, it is hard to design the depth smoothness loss to balance depth smoothness and sharpness. To address these issues, we propose a coarse-to-fine method with a normalized convolutional block attention module (NCBAM). In the coarse estimation stage, we incorporate the NCBAM into depth and pose networks to overcome the texture-copy and depth drift problems. Then, we use a new network to refine the coarse depth guided by the color image and produce a structure-preserving depth result in the refinement stage. Our method can produce results competitive with state-of-the-art methods. Comprehensive experiments prove the effectiveness of our two-stage method using the NCBAM.

Regular Paper Issue

A Comprehensive Pipeline for Complex Text-to-Image Synthesis

Fei Fang, Fei Luo, Hong-Pan Zhang, Hua-Jian Zhou, Alix L. H. Chow, Chun-Xia Xiao

Journal of Computer Science and Technology 2020, 35 (3): 522-537

Published: 29 May 2020

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.

total 3