Scholar - SciOpen

While recent Gaussian-based SLAM methods achieve photorealistic reconstruction from RGB-D data, their computational performance remains a critical bottleneck. State-of-the-art techniques operate at less than 20 fps, significantly lagging behind geometry-based approaches like KinectFusion (hundreds of fps). This limitation stems from the heavy computational burden: modeling scenes requires numerous Gaussians and complex iterative optimization to fit RGB-D data; insufficient Gaussian counts or optimization iterations cause severe quality degradation. To address this, we propose a Gaussian-SDF hybrid representation, combining a colorized signed distance field (SDF) for smooth geometry and appearance with 3D Gaussians to capture underrepresented details. The SDF is efficiently constructed via RGB-D fusion (as in geometry-based methods), while Gaussians undergo iterative optimization. Our representation enables significant Gaussian reduction (50% fewer) by avoiding full-scene Gaussian modeling, and efficient Gaussian optimization (75% fewer iterations) through targeted appearance refinement. Building upon this representation, we develop GPS-SLAM (Gaussian-plus-SDF SLAM), a real-time 3D reconstruction system achieving over 150 fps on real-world Azure Kinect sequences, faster by an order-of-magnitude than state-of-the-art techniques while maintaining comparable reconstruction quality. The source code and data are available at https://gapszju.github.io/GPS-SLAM.

Open Access Research Article Issue

Emotion amplification of facial videos using a fine-tuned StyleGAN

Yukun Xu, Justin N. M. Pinkney, Yong-Liang Yang, Tianjia Shao, Kun Zhou

Computational Visual Media 2025, 11(3): 587-601

Published: 19 May 2025

Abstract

PDF (21.8 MB) Collect Collected

Downloads：124

The ability to exhibit appropriate emotions is crucial for the expressiveness and attractiveness of facial videos. However, it is difficult to control the level of emotion, even for experienced actors and amateur podcasters on social networks. In this study, we aim to solve the novel problem of semantically amplifying the emotions of a facial video. This poses new challenges for effectively editing a sequence of video frames in terms of face semantics, emotion adaptiveness, and temporal coherence. Our approach is based on semantic face editing in the disentangled latent space of a state-of-the-art StyleGAN model. We presented a new face dataset with diverse emotions to fine-tune the pretrained StyleGAN and improve the expressiveness of its original emotion-biased latent space. An emotion-editing subspace was constructed to allow adaptive emotion amplification while preserving other facial attributes. We further propose an effective stitching-tuning technique to ensure temporally coherent video frames. Our work results in plausible emotion amplification for a wide range of facial videos. Qualitative and quantitative evaluations demonstrated the advantages of our method over other baseline methods. The proposed dataset and research code will be made publicly available.

Open Access Research Article Issue

Unsupervised image translation with distributional semantics awareness

Zhexi Peng, He Wang, Yanlin Weng, Yin Yang, Tianjia Shao

Computational Visual Media 2023, 9(3): 619-631

Published: 18 April 2023

Abstract

PDF (4.9 MB) Collect Collected

Downloads：75

Unsupervised image translation (UIT) studies the mapping between two image domains. Since such mappings are under-constrained, existing research has pursued various desirable properties such as distributional matching or two-way consistency. In this paper, we re-examine UIT from a new perspective: distributional semantics consistency, based on the observation that data variations contain semantics, e.g., shoes varying in colors. Further, the semantics can be multi-dimensional, e.g., shoes also varying in style, functionality, etc. Given two image domains, matching these semantic dimensions during UIT will produce mappings with explicable correspondences, which has not been investigated previously. We propose distributional semantics mapping (DSM), the first UIT method which explicitly matches semantics between two domains. We show that distributional semantics has been rarely considered within and beyond UIT, even though it is a common problem in deep learning. We evaluate DSM on several benchmark datasets, demonstrating its general ability to capture distributional semantics. Extensive comparisons show that DSM not only produces explicable mappings, but also improves image quality in general.

Regular Paper Issue

Preface

Shi-Min Hu, Paul L. Rosin, Tian-Jia Shao

Journal of Computer Science and Technology 2022, 37(3): 559-560

Published: 31 May 2022

Abstract Collect Collected

Total 4