Scholar - SciOpen

Novel space–time view synthesis for monocular video is a highly challenging task: both static and dynamic objects usually appear in the video, but only a single view of the current scene is available, resulting in inaccurate synthesis results. To address this challenge, we propose FRNeRF, a novel space–time view synthesis method with a fusion regularization field. Specifically, we design a 2D–3D fusion regularization field for the original dynamic neural field, which helps reduce blurring of dynamic objects in the scene. In addition, we add image prior features to the hierarchical sampling to solve the problem that the traditional hierarchical sampling strategy cannot obtain sufficient sampling points during training. We evaluate our method extensively on multiple datasets and show the results of dynamic space–time view synthesis. Our method achieves state-of-the-art performance both qualitatively and quantitatively. Code is available for research purposes at https://cic.tju.edu.cn/faculty/likun/projects/FRNerf.

Open Access Research Article Issue

Benchmarking visual SLAM methods in mirror environments

Peter Herbert, Jing Wu, Ze Ji, Yu-Kun Lai

Computational Visual Media 2024, 10(2): 215-241

Published: 03 January 2024

Abstract

PDF (9.5 MB) Collect Collected

Downloads：187

Visual simultaneous localisation and mapping (vSLAM) finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities, particularly mirror reflections. The effect of mirror presence (time visible and its average size in the frame) was hypothesised to impact localisation and mapping performance, with systems using direct techniques expected to perform worse. Thus, a dataset, MirrEnv, of image sequences recorded in mirror environments, was collected, and used to evaluate the performance of existing representative methods. RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration, whilst the remaining results did not show significantly degraded localisation performance. The mesh maps generated proved to be very inaccurate, with real and virtual reflections colliding in the reconstructions. A discussion is given of the likely sources of error and robustness in mirror environments, outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors. The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898.

Open Access Research Article Issue

STATE: Learning structure and texture representations for novel view synthesis

Xinyi Jing, Qiao Feng, Yu-Kun Lai, Jinsong Zhang, Yuanqiang Yu, Kun Li

Computational Visual Media 2023, 9(4): 767-786

Published: 11 July 2023

Abstract

PDF (10.1 MB) Collect Collected

Downloads：50

Novel viewpoint image synthesis is very challenging, especially from sparse views, due to large changes in viewpoint and occlusion. Existing image-based methods fail to generate reasonable results for invisible regions, while geometry-based methods have difficulties in synthesizing detailed textures. In this paper, we propose STATE, an end-to-end deep neural network, for sparse view synthesis by learning structure and texture representations. Structure is encoded as a hybrid feature field to predict reasonable structures for invisible regions while maintaining original structures for visible regions, and texture is encoded as a deformed feature map to preserve detailed textures. We propose a hierarchical fusion scheme with intra-branch and inter-branch aggregation, in which spatio-view attention allows multi-view fusion at the feature level to adaptively select important information by regressing pixel-wise or voxel-wise confidence maps. By decoding the aggregated features, STATE is able to generate realistic images with reasonable structures and detailed textures. Experimental results demonstrate that our method achieves qualitatively and quantitatively better results than state-of-the-art methods. Our method also enables texture and structure editing applications benefiting from implicit disentanglement of structure and texture. Our code is available at http://cic.tju.edu.cn/faculty/likun/projects/STATE.

Survey Issue

A Revisit of Shape Editing Techniques: From the Geometric to the Neural Viewpoint

Yu-Jie Yuan, Yu-Kun Lai, Tong Wu, Lin Gao, Ligang Liu

Journal of Computer Science and Technology 2021, 36(3): 520-554

Published: 05 May 2021

Abstract Collect Collected

3D shape editing is widely used in a range of applications such as movie production, computer games and computer aided design. It is also a popular research topic in computer graphics and computer vision. In past decades, researchers have developed a series of editing methods to make the editing process faster, more robust, and more reliable. Traditionally, the deformed shape is determined by the optimal transformation and weights for an energy formulation. With increasing availability of 3D shapes on the Internet, data-driven methods were proposed to improve the editing results. More recently as the deep neural networks became popular, many deep learning based editing methods have been developed in this field, which are naturally data-driven. We mainly survey recent research studies from the geometric viewpoint to those emerging neural deformation techniques and categorize them into organic shape editing methods and man-made model editing methods. Both traditional methods and recent neural network based methods are reviewed.

Open Access Research Article Issue

ClusterSLAM: A SLAM backend for simultaneous rigid body clustering and motion estimation

Jiahui Huang, Sheng Yang, Zishuo Zhao, Yu-Kun Lai, Shi-Min Hu

Computational Visual Media 2021, 7(1): 87-101

Published: 07 January 2021

Abstract

PDF (4.5 MB) Collect Collected

Downloads：113

We present a practical backend for stereovisual SLAM which can simultaneously discoverindividual rigid bodies and compute their motions in dynamic environments. While recent factor graph based state optimization algorithms have shown their ability to robustly solve SLAM problems by treating dynamic objects as outliers, their dynamic motions are rarely considered. In this paper, we exploit the consensus of 3D motions for landmarks extracted from the same rigid body for clustering, and to identify static and dynamic objects in a unified manner. Specifically, our algorithm builds a noise-aware motion affinity matrix from landmarks, and uses agglomerative clustering to distinguish rigid bodies. Using decoupled factor graph optimization to revise their shapes and trajectories, we obtain an iterative scheme to update both cluster assignments and motion estimation reciprocally. Evaluations on both synthetic scenes and KITTI demonstrate the capability of our approach, and further experiments considering online efficiency also show the effectiveness of our method for simultaneously tracking ego-motion and multiple objects.

Open Access Review Article Issue

A survey on deep geometry learning: From a representation perspective

Yun-Peng Xiao, Yu-Kun Lai, Fang-Lue Zhang, Chunpeng Li, Lin Gao

Computational Visual Media 2020, 6(2): 113-133

Published: 10 June 2020

Abstract

PDF (790.7 KB) Collect Collected

Downloads：170

Researchers have achieved great success in dealing with 2D images using deep learning. In recent years, 3D computer vision and geometry deep learning have gained ever more attention. Many advanced techniques for 3D shapes have been proposed for different applications. Unlike 2D images, which can be uniformly represented by a regular grid of pixels, 3D shapes have various representations, such as depth images, multi-view images, voxels, point clouds, meshes, implicit surfaces, etc. The performance achieved in different applications largely depends on the representa-tion used, and there is no unique representation that works well for all applications. Therefore, in this survey, we review recent developments in deep learning for 3D geometry from a representation perspective, summarizing the advantages and disadvantages of different representations for different applications. We also present existing datasets in these representations and further discuss future research directions.

Open Access Research Article Issue

Saliency guided local and global descriptors for effective action recognition

Ashwan Abdulmunem, Yu-Kun Lai, Xianfang Sun

Computational Visual Media 2016, 2(1): 97-106

Published: 29 January 2016

Abstract

PDF (3.1 MB) Collect Collected

Downloads：57

This paper presents a novel framework for human action recognition based on salient object detection and a new combination of local and global descriptors. We first detect salient objects in video frames and only extract features for such objects. We then use a simple strategy to identify and process only those video frames that contain salient objects. Processing salient objects instead of all frames not only makes the algorithm more efficient, but more importantly also suppresses the interference of background pixels. We combine this approach with a new combination of local and global descriptors, namely 3D-SIFT and histograms of oriented optical flow (HOOF), respectively. The resulting saliency guided 3D-SIFT-HOOF (SGSH) feature is used along with a multi-class support vector machine (SVM) classifier for human action recognition. Experiments conducted on the standard KTH and UCF-Sports action benchmarks show that our new method outperforms the competing state-of-the-art spatiotemporal feature-based human action recognition methods.

Total 7