Recently, neural implicit function-based representation has attracted more and more attention, and has been widely used to represent surfaces using differentiable neural networks. However, surface reconstruction from point clouds or multi-view images using existing neural geometry representations still suffer from slow computation and poor accuracy. To alleviate these issues, we propose a multi-scale hash encoding-based neural geometry representation which effectively and efficiently represents the surface as a signed distance field. Our novel neural network structure carefully combines low-frequency Fourier position encoding with multi-scale hash encoding. The initialization of the geometry network and geometry features of the rendering module are accordingly redesigned. Our experiments demonstrate that the proposed representation is at least 10 times faster for reconstructing point clouds with millions of points. It also significantly improves speed and accuracy of multi-view reconstruction. Our code and models are available at
- Article type
- Year
- Co-author
Optical flow estimation in human facial video, which provides 2D correspondences between adjacent frames, is a fundamental pre-processing step for many applications, like facial expression capture and recognition. However, it is quite challenging as human facial images contain large areas of similar textures, rich expressions, and large rotations. These characteristics also result in the scarcity of large, annotated real-world datasets. We propose a robust and accurate method to learn facial optical flow in a self-supervised manner. Specifically, we utilize various shape priors, including face depth, landmarks, and parsing, to guide the self-supervised learning task via a differentiable non-rigid registration framework. Extensive experiments demonstrate that our method achieves remarkable improvements for facial optical flow estimation in the presence of significant expressions and large rotations.
Face views are particularly important in person-to-person communication. Differenes between the camera location and the face orientation can result in undesirable facial appearances of the participants during video conferencing. This phenomenon is par-ticularly noticeable when using devices where the front-facing camera is placed in unconventional locations such as below the display or within the keyboard. In this paper, we take a video stream from a single RGB camera as input, and generate a video stream that emulates the view from a virtual camera at a designated location. The most challenging issue in this problem is that the corrected view often needs out-of-plane head rotations. To address this challenge, we reconstruct the 3D face shape and re-render it into synthesized frames according to the virtual camera location. To output the corrected video stream with natural appearance in real time, we propose several novel techniques including accurate eyebrow reconstruction, high-quality blending between the corrected face image and background, and template-based 3D reconstruction of glasses. Our system works well for different lighting conditions and skin tones, and can handle users wearing glasses. Extensive experiments and user studies demonstrate that our method provides high-quality results.
This paper presents a joint head pose and facial landmark regression method with input from depth images for realtime application. Our main contributions are: firstly, a joint optimization method to estimate head pose and facial landmarks, i.e., the pose regression result provides supervised initialization for cascaded facial landmark regression, while the regression result for the facial landmarks can also help to further refine the head pose at each stage. Secondly, we classify the head pose space into 9 sub-spaces, and then use a cascaded random forest with a global shape constraint for training facial landmarks in each specific space. This classification-guided method can effectively handle the problem of large pose changes and occlusion. Lastly, we have built a 3D face database containing 73 subjects, each with 14 expressions in various head poses. Experiments on challenging databases show our method achieves state-of-the-art performance on both head pose estimation and facial landmark regression.