Journal Home > Volume 25 , Issue 5

We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video. First, we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose. Then, we design a bi-objective optimization strategy to iteratively refine the obtained estimations. This strategy achieves faster speed and more accurate outputs. Finally, we further apply algebraic filtering processing, including Gaussian filter for background removal and extended Kalman filter for target prediction, to maintain real-time tracking superiority. Only general RGB photos or videos are required, which are captured by a commodity monocular camera without any priori or label. We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.


menu
Abstract
Full text
Outline
About this article

Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization

Show Author's information Xiaolong YangXiaohong Jia( )Mengke YuanDong-Ming Yan
Key Laboratory of Mathematics Mechanization, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China.
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

Abstract

We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video. First, we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose. Then, we design a bi-objective optimization strategy to iteratively refine the obtained estimations. This strategy achieves faster speed and more accurate outputs. Finally, we further apply algebraic filtering processing, including Gaussian filter for background removal and extended Kalman filter for target prediction, to maintain real-time tracking superiority. Only general RGB photos or videos are required, which are captured by a commodity monocular camera without any priori or label. We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.

Keywords: facial pose recognition, facial pose estimation, real-time tracking

References(30)

[1]
Y. B. Hu, X. Wu, B. Yu, R. He, and Z. Sun, Pose-guided photorealistic face rotation, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8398-8406.
DOI
[2]
T. Y. Yang, Y. T. Chen, Y. Y. Lin, and Y. Y. Chuang, FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1087-1096.
DOI
[3]
B. Chaudhuri, N. Vesdapunt, and B. Y. Wang, Joint face detection and facial motion retargeting for multiple faces, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9719-9728.
DOI
[4]
B. Gecer, S. Ploumpis, I. Kotsia, and S. Zafeiriou, GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1155-1164.
DOI
[5]
F. Z. Wu, L. C. Bao, Y. J. Chen, Y. G. Ling, Y. B. Song, S. N. Li, K. N. Ngan, and W. Liu, MVF-Net: Multi-view 3D face morphable model regression, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 959-968.
DOI
[6]
G. P. Meyer, S. Gupta, I. Frosio, D. Reddy, and J. Kautz, Robust model-based 3D head pose estimation, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3649-3657.
DOI
[7]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner, Face2Face: Real-time face capture and reenactment of RGB videos, in IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2387-2395.
DOI
[8]
C. Cao, M. L. Chai, O. Woodford, and L. J. Luo, Stabilized real-time face tracking via a learned dynamic rigidity prior, ACM Trans. Graph., vol. 37, no. 6, p. 233, 2018.
[9]
J. M. Saragih, S. Lucey, and J. F. Cohn, Deformable model fitting by regularized landmark mean-shift, Int. J. Comput. Vis., vol. 91, no. 2, pp. 200-215, 2011.
[10]
M. Kocabas, S. Karagoz, and E. Akbas, Self-supervised learning of 3D human pose using multi-view geometry, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1077-1086.
DOI
[11]
L. H. Ge, Z. Ren, Y. C. Li, Z. H. Xue, Y. Y. Wang, J. F. Cai, and J. S. Yuan, 3D hand shape and pose estimation from a single RGB image, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 10833-10842.
DOI
[12]
C. Wang, D. F. Xu, Y. K. Zhu, R. Martín-Martín, C. W. Lu, F. F. Li, and S. Savarese, DenseFusion: 6D object pose estimation by iterative dense fusion, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3343-3352.
DOI
[13]
S. D. Peng, Y. Liu, Q. X. Huang, X. W. Zhou, and H. J. Bao, PVNet: Pixel-wise voting network for 6DoF pose estimation, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 4561-4570.
DOI
[14]
A. Kumar and R. Chellappa, Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 430-439.
DOI
[15]
K. D. Cao, Y. Rong, C. Li, X. O. Tang, and C. C. Loy, Pose-robust face recognition via deep residual equivariant mapping, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5187-5196.
DOI
[16]
H. Abbas, Y. Hicks, D. Marshall, A. I. Zhurov, and S. Richmond, A 3D morphometric perspective for facial gender analysis and classification using geodesic path curvature features, Comput. Vis. Media, vol. 4, no. 1, pp. 17-32, 2018.
[17]
Y. Xiang, A. Alahi, and S. Savarese, Learning to track: Online multi-object tracking by decision making, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 4705-4713.
DOI
[18]
A. Crivellaro, M. Rad, Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, Robust 3D object tracking from monocular images using stable parts, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1465-1479, 2018.
[19]
R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran, Detect-and-track: Efficient pose estimation in videos, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 350-359.
DOI
[20]
Y. Z. Song, R. C. Fan, S. Huang, Z. Zhu, and R. F. Tong, A three-stage real-time detector for traffic signs in large panoramas, Comput. Vis. Media, .
[21]
S. Z. Zhu, C. Li, C. C. Loy, and X. O. Tang, Face alignment by coarse-to-fine shape searching, in IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 4998-5006.
[22]
C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, 300 faces in-the-wild challenge: Database and results, Image Vis. Comput., vol. 47, pp. 3-18, 2016.
[23]
C. W. Luo, J. Y. Zhang, J. Yu, C. W. Chen, and S. J. Wang, Real-time head pose estimation and face modeling from a depth image, IEEE Trans. Multimed., vol. 21, no. 10, pp. 2473-2481, 2019.
[24]
CASIA-3D Face V1. Institute of Automation, Chinese Academy of Sciences (CASIA), http://biometrics.idealtest.org, 2019.
[25]
O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep face recognition, presented at the British Machine Vision Conference, Swansea, UK, 2015.
DOI
[26]
N. Ruiz, E. Chong, and J. M. Rehg, Fine-grained head pose estimation without keypoints, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018, pp. 2074-2083.
DOI
[27]
V. Kazemi and J. Sullivan, One millisecond face alignmentwith an ensemble of regression trees, in IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1867-1874.
DOI
[28]
A. Bulat and G. Tzimiropoulos, How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks), in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1021-1030.
DOI
[29]
A. Kumar, A. Alavi, and R. Chellappa, KEPLER: Keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors, in Proc. 12th IEEE Int. Conf. Automatic Face & Gesture Recognition, Washington, DC, USA, 2017, pp. 258-265.
DOI
[30]
X. Y. Zhu, Z. Lei, X. M. Liu, H. L. Shi, and S. Z. Li, Face alignment across large poses: A 3D solution, in IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 146-155.
DOI
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 December 2019
Accepted: 02 January 2020
Published: 16 March 2020
Issue date: October 2020

Copyright

© The author(s) 2020

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61872354, 61772523, 61620106003, and 61802406), the National Key R&D Program of China (No. 2019YFB2204104), the Beijing Natural Science Foundation (Nos. L182059 and Z190004), the Intelligent Science and Technology Advanced Subject Project of University of Chinese Academy of Sciences (No. 115200S001), and the Alibaba Group through Alibaba Innovative Research Program.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return