Regular Paper Issue
PuzzleNet: Boundary-Aware Feature Matching for Non-Overlapping 3D Point Clouds Assembly
Journal of Computer Science and Technology 2023, 38 (3): 492-509
Published: 30 May 2023
Abstract Collect

We address the 3D shape assembly of multiple geometric pieces without overlaps, a scenario often encountered in 3D shape design, field archeology, and robotics. Existing methods depend on strong assumptions on the number of shape pieces and coherent geometry or semantics of shape pieces. Despite raising attention to 3D registration with complex or low overlapping patterns, few methods consider shape assembly with rare overlaps. To address this problem, we present a novel framework inspired by solving puzzles, named PuzzleNet, which conducts multi-task learning by leveraging both 3D alignment and boundary information. Specifically, we design an end-to-end neural network based on a point cloud transformer with two-way branches for estimating rigid transformation and predicting boundaries simultaneously. The framework is then naturally extended to reassemble multiple pieces into a full shape by using an iterative greedy approach based on the distance between each pair of candidate-matched pieces. To train and evaluate PuzzleNet, we construct two datasets, named DublinPuzzle and ModelPuzzle, based on a real-world urban scan dataset (DublinCity) and a synthetic CAD dataset (ModelNet40) respectively. Experiments demonstrate our effectiveness in solving 3D shape assembly for multiple pieces with arbitrary geometry and inconsistent semantics. Our method surpasses state-of-the-art algorithms by more than 10 times in rotation metrics and four times in translation metrics.

Open Access Research Article Issue
An anisotropic Chebyshev descriptor and its optimization for deformable shape correspondence
Computational Visual Media 2023, 9 (3): 461-477
Published: 21 March 2023
Abstract PDF (7.4 MB) Collect

Shape descriptors have recently gained popularity in shape matching, statistical shape mode-ling, etc. Their discriminative ability and efficiency play a decisive role in these tasks. In this paper, we first propose a novel handcrafted anisotropic spectral descriptor using Chebyshev polynomials, called the anisotropic Chebyshev descriptor (ACD); it can effec-tively capture shape features in multiple directions. The ACD inherits many good characteristics of spectral descriptors, such as being intrinsic, robust to changes in surface discretization, etc. Furthermore, due to the orthogonality of Chebyshev polynomials, the ACD is compact and can disambiguate intrinsic symmetry since several directions are considered. To improve the ACD’s discrimination ability, we construct a Chebyshev spectral manifold convolutional neural network (CSMCNN) that optimizes the ACD and produces a learned ACD. Our experimental results show that the ACD outperforms existing state-of-the-art handcrafted descriptors. The combination of the ACD and the CSMCNN is better than other state-of-the-art learned descriptors in terms of discrimination, efficiency, and robustness to changes in shape resolution and discretization.

Open Access Research Article Issue
Deep unfolding multi-scale regularizer network for image denoising
Computational Visual Media 2023, 9 (2): 335-350
Published: 03 January 2023
Abstract PDF (9.6 MB) Collect

Existing deep unfolding methods unroll an optimization algorithm with a fixed number of steps, and utilize convolutional neural networks (CNNs) to learn data-driven priors. However, their performance is limited for two main reasons. Firstly, priors learned in deep feature space need to be converted to the image space at each iteration step, which limits the depth of CNNs and prevents CNNs from exploiting contextual information. Secondly, existing methods only learn deep priors at the single full-resolution scale, so ignore the benefits of multi-scale context in dealing with high level noise. To address these issues, we explicitly consider the image denoising process in the deep feature space and propose the deep unfolding multi-scale regularizer network (DUMRN) for image denoising. The core of DUMRN is the feature-based denoising module (FDM) that directly removes noise in the deep feature space. In each FDM, we construct a multi-scale regularizer block to learn deep prior information from multi-resolution features. We build the DUMRN by stacking a sequence of FDMs and train it in an end-to-end manner. Experimental results on synthetic and real-world benchmarks demonstrate that DUMRN performs favorably compared to state-of-the-art methods.

Open Access Research Article Issue
Joint specular highlight detection and removal in single images via Unet-Transformer
Computational Visual Media 2023, 9 (1): 141-154
Published: 18 October 2022
Abstract PDF (7.4 MB) Collect

Specular highlight detection and removal is a fundamental problem in computer vision and image processing. In this paper, we present an efficient end-to-end deep learning model for automatically detecting and removing specular highlights in a single image. In particular, an encoder–decoder network is utilized to detect specular highlights, and then a novel Unet-Transformer network performs highlight removal; we append transformer modules instead of feature maps in the Unet architecture. We also introduce a highlight detection module as a mask to guide the removal task. Thus, these two networks can be jointly trained in an effective manner. Thanks to the hierarchical and global properties of the transformer mechanism, our framework is able to establish relationships between continuous self-attention layers, making it possible to directly model the mapping between the diffuse area and the specular highlight area, and reduce indeterminacy within areas containing strong specular highlight reflection. Experiments on public benchmark and real-world images demonstrate that our approach outperforms state-of-the-art methods for both highlight detection and removal tasks.

Open Access Research Article Issue
Scene text removal via cascaded text stroke detection and erasing
Computational Visual Media 2022, 8 (2): 273-287
Published: 06 December 2021
Abstract PDF (6.5 MB) Collect

Recent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Regular Paper Issue
Distinguishing Computer-Generated Images from Natural Images Using Channel and Pixel Correlation
Journal of Computer Science and Technology 2020, 35 (3): 592-602
Published: 29 May 2020
Abstract Collect

With the recent tremendous advances of computer graphics rendering and image editing technologies, computergenerated fake images, which in general do not reflect what happens in the reality, can now easily deceive the inspection of human visual system. In this work, we propose a convolutional neural network (CNN)-based model to distinguish computergenerated (CG) images from natural images (NIs) with channel and pixel correlation. The key component of the proposed CNN architecture is a self-coding module that takes the color images as input to extract the correlation between color channels explicitly. Unlike previous approaches that directly apply CNN to solve this problem, we consider the generality of the network (or subnetwork), i.e., the newly introduced hybrid correlation module can be directly combined with existing CNN models for enhancing the discrimination capacity of original networks. Experimental results demonstrate that the proposed network outperforms state-of-the-art methods in terms of classification performance. We also show that the newly introduced hybrid correlation module can improve the classification accuracy of different CNN architectures.

Open Access Research Article Issue
Learning local shape descriptors for computing non-rigid dense correspondence
Computational Visual Media 2020, 6 (1): 95-112
Published: 23 March 2020
Abstract PDF (1.3 MB) Collect

A discriminative local shape descriptor plays an important role in various applications. In this paper, we present a novel deep learning framework that derives discriminative local descriptors for deformable 3D shapes. We use local "geometry images" to encode the multi-scale local features of a point, via an intrinsic parameterization method based on geodesic polar coordinates. This new parameterization provides robust geometry images even for badly-shaped triangular meshes. Then a triplet network with shared architecture and parameters is used to perform deep metric learning; its aim is to distinguish between similar and dissimilar pairs of points. Additionally, a newly designed triplet loss function is minimized for improved, accurate training of the triplet network. To solve the dense correspondence problem, an efficient sampling approach is utilized to achieve a good compromise between training performance and descriptor quality. During testing, given a geometry image of a point of interest, our network outputs a discriminative local descriptor for it. Extensive testing of non-rigid dense shape matching on a variety of benchmarks demonstrates the superiority of the proposed descriptors over the state-of-the-art alternatives.

Open Access Issue
Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization
Tsinghua Science and Technology 2020, 25 (5): 690-700
Published: 16 March 2020
Abstract PDF (14.6 MB) Collect

We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video. First, we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose. Then, we design a bi-objective optimization strategy to iteratively refine the obtained estimations. This strategy achieves faster speed and more accurate outputs. Finally, we further apply algebraic filtering processing, including Gaussian filter for background removal and extended Kalman filter for target prediction, to maintain real-time tracking superiority. Only general RGB photos or videos are required, which are captured by a commodity monocular camera without any priori or label. We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.

Regular Paper Issue
Fast and Error-Bounded Space-Variant Bilateral Filtering
Journal of Computer Science and Technology 2019, 34 (3): 550-568
Published: 10 May 2019
Abstract Collect

The traditional space-invariant isotropic kernel utilized by a bilateral filter (BF) frequently leads to blurry edges and gradient reversal artifacts due to the existence of a large amount of outliers in the local averaging window. However, the efficient and accurate estimation of space-variant kernels which adapt to image structures, and the fast realization of the corresponding space-variant bilateral filtering are challenging problems. To address these problems, we present a space-variant BF (SVBF), and its linear time and error-bounded acceleration method. First, we accurately estimate spacevariant anisotropic kernels that vary with image structures in linear time through structure tensor and minimum spanning tree. Second, we perform SVBF in linear time using two error-bounded approximation methods, namely, low-rank tensor approximation via higher-order singular value decomposition and exponential sum approximation. Therefore, the proposed SVBF can efficiently achieve good edge-preserving results. We validate the advantages of the proposed filter in applications including: image denoising, image enhancement, and image focus editing. Experimental results demonstrate that our fast and error-bounded SVBF is superior to state-of-the-art methods.

Open Access Research Article Issue
Surface remeshing with robust user-guided segmentation
Computational Visual Media 2018, 4 (2): 113-122
Published: 16 March 2018
Abstract PDF (30.7 MB) Collect

Surface remeshing is widely required in modeling, animation, simulation, and many other computer graphics applications. Improving the elements’ quality is a challenging task in surface remeshing. Existing methods often fail to efficiently remove poor-quality elements especially in regions with sharp features. In this paper, we propose and use a robust segmentation method followed by remeshing the segmented mesh. Mesh segmentation is initiated using an existing Live-wire interaction approach and is further refined using local mesh operations. The refined segmented mesh is finally sent to the remeshing pipeline, in which each mesh segment is remeshed independently. An experimental study compares our mesh segmentation method as well as remeshing results with representative existing methods. We demonstrate that the proposed segmentation method is robust and suitable for remeshing.

Total 11