Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization

Xiaolong Yang; Xiaohong Jia; Mengke Yuan; Dong-Ming Yan

doi:10.26599/TST.2020.9010001

Tsinghua Science and Technology 2020, 25(5): 690-700 https://doi.org/10.26599/TST.2020.9010001

Open Access | Issue | Published: 16 March 2020

Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization

Show Author's Information Hide Author's Information Xiaolong Yang, Xiaohong Jia(

), Mengke Yuan, Dong-Ming Yan

Key Laboratory of Mathematics Mechanization, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100190, China

University of Chinese Academy of Sciences, Beijing 100049, China.

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

Keywords:

facial pose recognition, facial pose estimation, real-time tracking

Cite this article:

Yang X, Jia X, Yuan M, et al. Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization. Tsinghua Science and Technology, 2020, 25(5): 690-700. https://doi.org/10.26599/TST.2020.9010001

Download citation

EndNote(RIS)

BibTeX

552

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

We present a novel and efficient method for real-time multiple facial poses estimation and tracking in a single frame or video. First, we combine two standard convolutional neural network models for face detection and mean shape learning to generate initial estimations of alignment and pose. Then, we design a bi-objective optimization strategy to iteratively refine the obtained estimations. This strategy achieves faster speed and more accurate outputs. Finally, we further apply algebraic filtering processing, including Gaussian filter for background removal and extended Kalman filter for target prediction, to maintain real-time tracking superiority. Only general RGB photos or videos are required, which are captured by a commodity monocular camera without any priori or label. We demonstrate the advantages of our approach by comparing it with the most recent work in terms of performance and accuracy.

Full text

Abstract

Full text

Outline

About this article

Real-Time Facial Pose Estimation and Tracking by Coarse-to-Fine Iterative Optimization

Show Author's information Hide Author's Information Xiaolong Yang, Xiaohong Jia(

), Mengke Yuan, Dong-Ming Yan

Key Laboratory of Mathematics Mechanization, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing 100190, China

University of Chinese Academy of Sciences, Beijing 100049, China.

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China.

Abstract

Keywords: facial pose recognition, facial pose estimation, real-time tracking

References(30)

[1]

Y. B. Hu, X. Wu, B. Yu, R. He, and Z. Sun, Pose-guided photorealistic face rotation, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8398-8406.

DOI

[2]

T. Y. Yang, Y. T. Chen, Y. Y. Lin, and Y. Y. Chuang, FSA-Net: Learning fine-grained structure aggregation for head pose estimation from a single image, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1087-1096.

DOI

[3]

B. Chaudhuri, N. Vesdapunt, and B. Y. Wang, Joint face detection and facial motion retargeting for multiple faces, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9719-9728.

DOI

[4]

B. Gecer, S. Ploumpis, I. Kotsia, and S. Zafeiriou, GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1155-1164.

DOI

[5]

F. Z. Wu, L. C. Bao, Y. J. Chen, Y. G. Ling, Y. B. Song, S. N. Li, K. N. Ngan, and W. Liu, MVF-Net: Multi-view 3D face morphable model regression, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 959-968.

DOI

[6]

G. P. Meyer, S. Gupta, I. Frosio, D. Reddy, and J. Kautz, Robust model-based 3D head pose estimation, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3649-3657.

DOI

[7]

J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner, Face2Face: Real-time face capture and reenactment of RGB videos, in IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2387-2395.

DOI

[8]

C. Cao, M. L. Chai, O. Woodford, and L. J. Luo, Stabilized real-time face tracking via a learned dynamic rigidity prior, ACM Trans. Graph., vol. 37, no. 6, p. 233, 2018.

DOI Google Scholar

[9]

J. M. Saragih, S. Lucey, and J. F. Cohn, Deformable model fitting by regularized landmark mean-shift, Int. J. Comput. Vis., vol. 91, no. 2, pp. 200-215, 2011.

DOI Google Scholar

[10]

M. Kocabas, S. Karagoz, and E. Akbas, Self-supervised learning of 3D human pose using multi-view geometry, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1077-1086.

DOI

[11]

L. H. Ge, Z. Ren, Y. C. Li, Z. H. Xue, Y. Y. Wang, J. F. Cai, and J. S. Yuan, 3D hand shape and pose estimation from a single RGB image, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 10833-10842.

DOI

[12]

C. Wang, D. F. Xu, Y. K. Zhu, R. Martín-Martín, C. W. Lu, F. F. Li, and S. Savarese, DenseFusion: 6D object pose estimation by iterative dense fusion, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3343-3352.

DOI

[13]

S. D. Peng, Y. Liu, Q. X. Huang, X. W. Zhou, and H. J. Bao, PVNet: Pixel-wise voting network for 6DoF pose estimation, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 4561-4570.

DOI

[14]

A. Kumar and R. Chellappa, Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 430-439.

DOI

[15]

K. D. Cao, Y. Rong, C. Li, X. O. Tang, and C. C. Loy, Pose-robust face recognition via deep residual equivariant mapping, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5187-5196.

DOI

[16]

H. Abbas, Y. Hicks, D. Marshall, A. I. Zhurov, and S. Richmond, A 3D morphometric perspective for facial gender analysis and classification using geodesic path curvature features, Comput. Vis. Media, vol. 4, no. 1, pp. 17-32, 2018.

DOI Google Scholar

[17]

Y. Xiang, A. Alahi, and S. Savarese, Learning to track: Online multi-object tracking by decision making, in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 4705-4713.

DOI

[18]

A. Crivellaro, M. Rad, Y. Verdie, K. M. Yi, P. Fua, and V. Lepetit, Robust 3D object tracking from monocular images using stable parts, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6, pp. 1465-1479, 2018.

DOI Google Scholar

[19]

R. Girdhar, G. Gkioxari, L. Torresani, M. Paluri, and D. Tran, Detect-and-track: Efficient pose estimation in videos, in IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 350-359.

DOI

[20]

Y. Z. Song, R. C. Fan, S. Huang, Z. Zhu, and R. F. Tong, A three-stage real-time detector for traffic signs in large panoramas, Comput. Vis. Media, .

DOI Google Scholar

[21]

S. Z. Zhu, C. Li, C. C. Loy, and X. O. Tang, Face alignment by coarse-to-fine shape searching, in IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 4998-5006.

[22]

C. Sagonas, E. Antonakos, G. Tzimiropoulos, S. Zafeiriou, and M. Pantic, 300 faces in-the-wild challenge: Database and results, Image Vis. Comput., vol. 47, pp. 3-18, 2016.

DOI Google Scholar

[23]

C. W. Luo, J. Y. Zhang, J. Yu, C. W. Chen, and S. J. Wang, Real-time head pose estimation and face modeling from a depth image, IEEE Trans. Multimed., vol. 21, no. 10, pp. 2473-2481, 2019.

DOI Google Scholar

[24]

CASIA-3D Face V1. Institute of Automation, Chinese Academy of Sciences (CASIA), http://biometrics.idealtest.org, 2019.

[25]

O. M. Parkhi, A. Vedaldi, and A. Zisserman, Deep face recognition, presented at the British Machine Vision Conference, Swansea, UK, 2015.

DOI

[26]

N. Ruiz, E. Chong, and J. M. Rehg, Fine-grained head pose estimation without keypoints, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018, pp. 2074-2083.

DOI

[27]

V. Kazemi and J. Sullivan, One millisecond face alignmentwith an ensemble of regression trees, in IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1867-1874.

DOI

[28]

A. Bulat and G. Tzimiropoulos, How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks), in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1021-1030.

DOI

[29]

A. Kumar, A. Alavi, and R. Chellappa, KEPLER: Keypoint and pose estimation of unconstrained faces by learning efficient H-CNN regressors, in Proc. 12th IEEE Int. Conf. Automatic Face & Gesture Recognition, Washington, DC, USA, 2017, pp. 258-265.

DOI

[30]

X. Y. Zhu, Z. Lei, X. M. Liu, H. L. Shi, and S. Z. Li, Face alignment across large poses: A 3D solution, in IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 146-155.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 31 December 2019

Accepted: 02 January 2020

Published: 16 March 2020

Issue date: October 2020

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61872354, 61772523, 61620106003, and 61802406), the National Key R&D Program of China (No. 2019YFB2204104), the Beijing Natural Science Foundation (Nos. L182059 and Z190004), the Intelligent Science and Technology Advanced Subject Project of University of Chinese Academy of Sciences (No. 115200S001), and the Alibaba Group through Alibaba Innovative Research Program.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).