Sort:
Open Access Issue
MMAR-Net: A Multi-Stride and Multi-Resolution Affine Registration Network for CT Images
Big Data Mining and Analytics 2024, 7(4): 1287-1300
Published: 04 December 2024
Abstract PDF (4.1 MB) Collect
Downloads:47

The evolution of lung lesions can be assessed by examining multiple CT screenings, which needs to align two CT images accurately. In this study, we propose a multi-stride and multi-resolution affine registration network, called MMAR-net, for 3D affine registration of medical images, which works in an unsupervised way by optimizing the similarity loss. In order to extract more extensive image features, we use a multi-stride module to replace the conventional convolution module. Furthermore, we make use of the image features at multiple scales by dot product between two feature vectors, which could enhance the robustness of image representation. We conduct comprehensive comparison experiments between our model and the existing affine registration methods on two publicly available datasets, DIR-Lab and Learn2Reg, which are both relevant to lung CT image registration. Quantitative and qualitative comparison results demonstrate that our model outperforms existing single-step affine registration networks. Our method improves the key metric of dice similarity coefficient on DIR-Lab and Learn2Reg to 90.57% and 95.51%, respectively.

Regular Paper Issue
Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification
Journal of Computer Science and Technology 2023, 38(3): 510-525
Published: 30 May 2023
Abstract Collect

Based on well-designed network architectures and objective functions, self-supervised monocular depth estimation has made great progress. However, lacking a specific mechanism to make the network learn more about the regions containing moving objects or occlusion scenarios, existing depth estimation methods likely produce poor results for them. Therefore, we propose an uncertainty quantification method to improve the performance of existing depth estimation networks without changing their architectures. Our uncertainty quantification method consists of uncertainty measurement, the learning guidance by uncertainty, and the ultimate adaptive determination. Firstly, with Snapshot and Siam learning strategies, we measure the uncertainty degree by calculating the variance of pre-converged epochs or twins during training. Secondly, we use the uncertainty to guide the network to strengthen learning about those regions with more uncertainty. Finally, we use the uncertainty to adaptively produce the final depth estimation results with a balance of accuracy and robustness. To demonstrate the effectiveness of our uncertainty quantification method, we apply it to two state-of-the-art models, Monodepth2 and Hints. Experimental results show that our method has improved the depth estimation performance in seven evaluation metrics compared with two baseline models and exceeded the existing uncertainty method.

Open Access Research Article Issue
Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module
Computational Visual Media 2022, 8(4): 631-647
Published: 16 June 2022
Abstract PDF (5.2 MB) Collect
Downloads:98

Self-supervised monocular depth estimation has been widely investigated and applied in previous works. However, existing methods suffer from texture-copy, depth drift, and incomplete structure. It is difficult for normal CNN networks to completely understand the relationship between the object and its surrounding environment. Moreover, it is hard to design the depth smoothness loss to balance depth smoothness and sharpness. To address these issues, we propose a coarse-to-fine method with a normalized convolutional block attention module (NCBAM). In the coarse estimation stage, we incorporate the NCBAM into depth and pose networks to overcome the texture-copy and depth drift problems. Then, we use a new network to refine the coarse depth guided by the color image and produce a structure-preserving depth result in the refinement stage. Our method can produce results competitive with state-of-the-art methods. Comprehensive experiments prove the effectiveness of our two-stage method using the NCBAM.

Regular Paper Issue
A Comprehensive Pipeline for Complex Text-to-Image Synthesis
Journal of Computer Science and Technology 2020, 35(3): 522-537
Published: 29 May 2020
Abstract Collect

Synthesizing a complex scene image with multiple objects and background according to text description is a challenging problem. It needs to solve several difficult tasks across the fields of natural language processing and computer vision. We model it as a combination of semantic entity recognition, object retrieval and recombination, and objects’ status optimization. To reach a satisfactory result, we propose a comprehensive pipeline to convert the input text to its visual counterpart. The pipeline includes text processing, foreground objects and background scene retrieval, image synthesis using constrained MCMC, and post-processing. Firstly, we roughly divide the objects parsed from the input text into foreground objects and background scenes. Secondly, we retrieve the required foreground objects from the foreground object dataset segmented from Microsoft COCO dataset, and retrieve an appropriate background scene image from the background image dataset extracted from the Internet. Thirdly, in order to ensure the rationality of foreground objects’ positions and sizes in the image synthesis step, we design a cost function and use the Markov Chain Monte Carlo (MCMC) method as the optimizer to solve this constrained layout problem. Finally, to make the image look natural and harmonious, we further use Poisson-based and relighting-based methods to blend foreground objects and background scene image in the post-processing step. The synthesized results and comparison results based on Microsoft COCO dataset prove that our method outperforms some of the state-of-the-art methods based on generative adversarial networks (GANs) in visual quality of generated scene images.

Total 4