Sort:
Open Access Research Article Issue
THP: Tensor-field-driven hierarchical path planning for autonomous scene exploration with depth sensors
Computational Visual Media 2024, 10(6): 1121-1135
Published: 18 May 2024
Abstract PDF (8.1 MB) Collect
Downloads:0

It is challenging to automatically explore an unknown 3D environment with a robot only equipped with depth sensors due to the limited field of view. We introduce THP, a tensor field-based framework for efficient environment exploration which can better utilize the encoded depth information through the geometric characteristics of tensor fields. Specifically, a corresponding tensor field is constructed incrementally and guides the robot to formulate optimal global exploration paths and a collision-free local movement strategy. Degenerate points generated during the exploration are adopted as anchors to formulate a hierarchical TSP for global path optimization. This novel strategy can help the robot avoid long-distance round trips more effectively while maintaining scanning completeness. Furthermore, the tensor field also enables a local movement strategy to avoid collision based on particle advection. As a result, the framework can eliminate massive, time-consuming recalculations of local movement paths. We have experimentally evaluate our method with a ground robot in 8 complex indoor scenes. Our method can on average achieve 14% better exploration efficiency and 21% better exploration completeness than state-of-the-art alternatives using LiDAR scans. Moreover, compared to similar methods, our method makes path decisions 39% faster due to our hierarchical exploration strategy.

Open Access Research Article Issue
Learning accurate template matching with differentiable coarse-to-fine correspondence refinement
Computational Visual Media 2024, 10(2): 309-330
Published: 03 January 2024
Abstract PDF (9.2 MB) Collect
Downloads:24

Template matching is a fundamental task in computer vision and has been studied for decades. It plays an essential role in manufacturing industry for estimating the poses of different parts, facilitating downstream tasks such as robotic grasping. Existing methods fail when the template and source images have different modalities, cluttered backgrounds, or weak textures. They also rarely consider geometric transformations via homographies, which commonly exist even for planar industrial parts. To tackle the challenges, we propose an accurate template matching method based on differentiable coarse-to-fine correspondence refinement. We use an edge-aware module to overcome the domain gap between the mask template and the grayscale image, allowing robust matching. An initial warp is estimated using coarse correspondences based on novel structure-aware information provided by transformers. This initial alignment is passed to a refinement network using references and aligned images to obtain sub-pixel level correspondences which are used to give the final geometric transformation. Extensive evaluation shows that our method to be significantly better than state-of-the-art methods and baselines, providing good generalization ability and visually plausible results even on unseen real data.

Open Access Research Article Issue
6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features
Computational Visual Media 2024, 10(1): 61-77
Published: 30 November 2023
Abstract PDF (7 MB) Collect
Downloads:25

The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry. A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method for pose estimation for geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.

Open Access Research Article Issue
EFECL: Feature encoding enhancement with contrastive learning for indoor 3D object detection
Computational Visual Media 2023, 9(4): 875-892
Published: 03 August 2023
Abstract PDF (8.5 MB) Collect
Downloads:8

Good proposal initials are critical for 3D object detection applications. However, due to the significant geometry variation of indoor scenes, incomplete and noisy proposals are inevitable in most cases. Mining feature information among these "bad" proposals may mislead the detection. Contrastive learning provides a feasible way for representing proposals, which can align complete and incomplete/noisy proposals in feature space. The aligned feature space can help us build robust 3D representation even if bad proposals are given. Therefore, we devise a new contrast learning framework for indoor 3D object detection, called EFECL, that learns robust 3D representations by contrastive learning of proposals on two different levels. Specifically, we optimize both instance-level and category-level contrasts to align features by capturing instance-specific characteristics and semantic-aware common patterns. Furthermore, we propose an enhanced feature aggregation module to extract more general and informative features for contrastive learning. Evaluations on ScanNet V2 and SUN RGB-D benchmarks demonstrate the generalizability and effectiveness of our method, and our method can achieve 12.3% and 7.3% improvements on both datasets over the benchmark alternatives. The code and models are publicly available at https://github.com/YaraDuan/EFECL.

Open Access Research Article Issue
ARM3D: Attention-based relation module for indoor 3D object detection
Computational Visual Media 2022, 8(3): 395-414
Published: 08 March 2022
Abstract PDF (2.5 MB) Collect
Downloads:36

Relation contexts have been proved to be useful for many challenging vision tasks. In the field of 3D object detection, previous methods have been taking the advantage of context encoding, graph embedding, orexplicit relation reasoning to extract relation contexts. However, there exist inevitably redundant relation contexts due to noisy or low-quality proposals. In fact, invalid relation contexts usually indicate underlying scene misunderstanding and ambiguity, which may, on the contrary, reduce the performance in complex scenes. Inspired by recent attention mechanism like Transformer, we propose a novel 3D attention-based relation module (ARM3D). It encompasses object-aware relation reasoning to extract pair-wise relation contexts among qualified proposals and an attention module to distribute attention weights towards different relation contexts. In this way, ARM3D can take full advantage of the useful relation contexts and filter those less relevant or even confusing contexts, which mitigates the ambiguity in detection. We have evaluated the effectiveness of ARM3D by plugging it into several state-of-the-art 3D object detectors and showing more accurate and robust detection results. Extensive experiments show the capability and generalization of ARM3D on 3D object detection. Our source code is available at https://github.com/lanlan96/ARM3D.

Open Access Research Article Issue
Recurrent 3D attentional networks for end-to-end active object recognition
Computational Visual Media 2019, 5(1): 91-104
Published: 08 April 2019
Abstract PDF (9.1 MB) Collect
Downloads:23

Active vision is inherently attention-driven: an agent actively selects views to attend in order to rapidly perform a vision task while improving its internal representation of the scene being observed. Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we address multi-view depth-based active object recognition using an attention mechanism, by use of an end-to-end recurrent 3D attentional network. The architecture takes advantage of a recurrent neural network to store and update an internal representation. Our model, trained with 3D shape datasets, is able to iteratively attend the best views targeting an object of interest for recognizing it. To realize 3D view selection, we derive a 3D spatial transformer network. It is differentiable, allowing training with backpropagation, and so achiev-ing much faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method, with only depth input, achieves state-of-the-art next-best-view performance both in terms of time taken and recognition accuracy.

Total 6