AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation

School of Computer Science and Engineering, Macau University of Science and Technology, Macau 999078, China
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
Department of Computer Science, Wayne State University, Detroit, MI 48202, USA
Show Author Information

Abstract

Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.

References

[1]

F. Zhang, Q. Mao, X. Shen, Y. Zhan, and M. Dong, Spatially coherent feature learning for pose-invariant facial expression recognition, ACM Trans. Multimed. Comput. Commun. Appl., vol. 14, no. 1s, p. 27, 2018.

[2]

P. Ekman and W. V. Friesen, Nonverbal leakage and clues to deception, Psychiatry, vol. 32, no. 1, pp. 88–106, 1969.

[3]
P. Ekman, Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage (Revised Edition). New York, NY, USA: WW Norton & Company, 2009.
[4]
N. Michael, M. Dilsizian, D. Metaxas, and J. K. Burgoon, Motion profiles for deception detection using visual cues, in Proc. 11th European Conf. Computer Vision, Heraklion, Greece, 2010, pp. 462–475.
[5]

A. K. Davison, W. Merghani, and M. H. Yap, Objective classes for micro-facial expression recognition, J. Imaging, vol. 4, no. 10, p. 119, 2018.

[6]
P. Ekman and W. V. Friesen, Facial action coding system: A technique for the measurement of facial movement, Palo Alto, CA, USA: Consulting Psychologists Press, 2002.
[7]
M. H. Yap, J. See, X. Hong, and S. J. Wang, Facial micro-expressions grand challenge 2018 summary, in Proc. 2018 13th IEEE Int. Conf. Automatic Face & Gesture Recognition, Xi’an, China, 2018, pp. 675–678.
[8]
W. J. Yan, X. Li, S. J. Wang, G. Zhao, Y. J. Liu, Y. H. Chen, and X. Fu, CASME II: An improved spontaneous micro-expression database and the baseline evaluation, PLoS One, vol. 9, no. 1, p. e86041, 2014.
[9]
A. K. Davison, C. Lansley, N. Costen, K. Tan, and M. H. Yap, SAMM: A spontaneous micro-facial movement dataset, IEEE Trans. Affect. Comput., vol. 9, no. 1, pp. 116–129, 2018.
[10]
P. Ekman and W. V. Friesen, Facial action coding system, https://doi.org/10.1037/t27734-000, 2024.
[11]

G. Zhao and M. Pietikäinen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 915–928, 2007.

[12]
S. Polikovsky, Y. Kameda, and Y. Ohta, Facial micro-expressions recognition using high speed camera and 3d-gradient descriptor, in Proc. 3rd Int. Conf. Imaging for Crime Detection and Prevention, London, UK, 2009, pp. 1–6.
[13]

Y. J. Liu, J. K. Zhang, W. J. Yan, S. J. Wang, G. Zhao, and X. Fu, A main directional mean optical flow feature for spontaneous micro-expression recognition, IEEE Trans. Affect. Comput., vol. 7, no. 4, pp. 299–310, 2016.

[14]
M. Peng, Z. Wu, Z. Zhang, and T. Chen, From macro to micro expression recognition: Deep learning on small datasets using transfer learning, in Proc. 2018 13th IEEE Int. Conf. Automatic Face & Gesture Recognition, Xi’an, China, 2018, pp. 657–661.
[15]
H. Q. Khor, J. See, R. C. W. Phan, and W. Lin, Enriched long-term recurrent convolutional network for facial micro-expression recognition, in Proc. 2018 13th IEEE Int. Conf. Automatic Face & Gesture Recognition, Xi’an, China, 2018, pp. 667–674.
[16]
G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, presented on 8th European Conf. Computer Vision Workshop, Prague, Czech Republic, 2004.
[17]
J. Yang, K. Yu, Y. Gong, and T. Huang, Linear spatial pyramid matching using sparse coding for image classification, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 1794–1801.
[18]
Z. Wang, S. Wang, H. Li, Z. Dou, and J. Li, Graph-propagation based correlation learning for weakly supervised fine-grained image classification, in Proc. 34th AAAI Conf. Artificial Intelligence, New York, NY, USA, 2020, pp. 12289–12296.
[19]
W. Li, F. Abtahi, and Z. Zhu, Action unit detection with region adaptation, multi-labeling learning and optimal temporal fusing, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6766–6775.
[20]
G. Li, X. Zhu, Y. Zeng, Q. Wang, and L. Lin, Semantic relationships guided representation learning for facial action unit recognition, in Proc. 33rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 8594–8601.
[21]
H. X. Xie, L. Lo, H. H. Shuai, and W. H. Cheng, AU-assisted graph attention convolutional network for micro-expression recognition, in Proc. 28th Int. Conf. Multimedia, Seattle, WA, USA, 2020, pp. 2871–2880.
[22]
S. M. Leong, F. Noman, R. C. W. Phan, V. M. Baskaran, and C. M. Ting, GraphEx: Facial action unit graph for micro-expression classification, in Proc. 2022 IEEE Int. Conf. Image Processing, Bordeaux, France, 2022, pp. 3296–3300.
[23]

G. Zhou, S. Yuan, H. Xing, Y. Jiang, P. Geng, Y. Cao, and X. Ben, Micro-expression action unit recognition based on dynamic image and spatial pyramid, J. Supercomput., vol. 79, no. 17, pp. 19879–19902, 2023.

[24]
Z. Liu, J. Dong, C. Zhang, L. Wang, and J. Dang, Relation modeling with graph convolutional networks for facial action unit detection, in Proc. 26th Int. Conf. Multimedia Modeling, Daejeon, Republic of Korea, 2020, pp. 489–501.
[25]
Z. Shao, L. Zou, J. Cai, Y. Wu, and L. Ma, Spatio-temporal relation and attention learning for facial action unit detection, arXiv preprint arXiv: 2001.01168v1, 2023.
[26]
T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, in Proc. 5th Int. Conf. Learning Representations, https://dblp.org/db/conf/iclr/iclr2017.html#KipfW17, 2017.
[27]

S. Polikovsky, Y. Kameda, and Y. Ohta, Facial micro-expression detection in hi-speed video based on facial action coding system (FACS), IEICE Trans. Inf. Syst., vol. E96.D, no. 1, pp. 81–92, 2013.

[28]
R. Chaudhry, A. Ravichandran, G. Hager, and R. Vidal, Histograms of oriented optical flow and Binet-Cauchy kernels on nonlinear dynamical systems for the recognition of human actions, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 1932–1939.
[29]
Y. Wang, J. See, R. C. W. Phan, and Y. H. Oh, LBP with six intersection points: Reducing redundant information in LBP-TOP for micro-expression recognition, in Proc. 12th Asian Conf. Computer Vision, Singapore, 2014, pp. 525–537.
[30]

Y. Wang, J. See, R. C. W. Phan, and Y. H. Oh, Efficient spatio-temporal local binary patterns for spontaneous facial micro-expression recognition, PLoS One, vol. 10, no. 5, p. e0124674, 2015.

[31]
X. Huang, S. J. Wang, G. Zhao, and M. Piteikäinen, Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection, in Proc. 2015 IEEE Int. Conf. Computer Vision Workshop, Santiago, Chile, 2015, pp. 1–9.
[32]

X. Huang, G. Zhao, X. Hong, W. Zheng, and M. Pietikäinen, Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns, Neurocomputing, vol. 175, pp. 564–578, 2016.

[33]

X. Huang, S. J. Wang, X. Liu, G. Zhao, X. Feng, and M. Pietikäinen, Discriminative spatiotemporal local binary pattern with revisited integral projection for spontaneous facial micro-expression recognition, IEEE Trans. Affect. Comput., vol. 10, no. 1, pp. 32–47, 2019.

[34]

Y. Zong, X. Huang, W. Zheng, Z. Cui, and G. Zhao, Learning from hierarchical spatiotemporal descriptors for micro-expression recognition, IEEE Trans. Multimed., vol. 20, no. 11, pp. 3160–3172, 2018.

[35]

Z. Xia, X. Hong, X. Gao, X. Feng, and G. Zhao, Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions, IEEE Trans. Multimed., vol. 22, no. 3, pp. 626–640, 2020.

[36]

Y. S. Gan, S. T. Liong, W. C. Yau, Y. C. Huang, and L. K. Tan, Off-apexnet on micro-expression recognition system, Signal Process. Image Commun., vol. 74, pp. 129–139, 2019.

[37]

Y. Li, X. Huang, and G. Zhao, Joint local and global information learning with single apex frame detection for micro-expression recognition, IEEE Trans. Image Process., vol. 30, pp. 249–263, 2021.

[38]
X. Nie, M. A. Takalkar, M. Duan, H. Zhang, and M. Xu, GEME: Dual-stream multi-task gender-based micro-expression recognition, Neurocomputing, vol. 427, pp. 13–28, 2021.
[39]
Z. Zhai, J. Zhao, C. Long, W. Xu, S. He, and H. Zhao, Feature representation learning with adaptive displacement generation and transformer fusion for micro-expression recognition, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 22086–22095.
[40]
M. Verma, P. Lubal, S. K. Vipparthi, and M. Abdel-Mottaleb, RNAS-MER: A refined neural architecture search with hybrid spatiotemporal operations for micro-expression recognition, in Proc. 2023 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2023, pp. 4759–4768.
[41]
X. B. Nguyen, C. N. Duong, X. Li, S. Gauch, H. S. Seo, and K. Luu, Micron-BERT: BERT-based facial micro-expression recognition, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 1482–1492.
[42]
M. Simon, E. Rodner, and J. Denzler, ImageNet pre-trained models with batch normalization, arXiv preprint arXiv: 1612.01452, 2016.
[43]

C. Wang, M. Peng, T. Bi, and T. Chen, Micro-attention for micro-expression recognition, Neurocomputing, vol. 410, pp. 354–362, 2020.

[44]
S. Eleftheriadis, O. Rudovic, and M. Pantic, Multi-conditional latent variable model for joint facial action unit detection, in Proc. 2015 IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3792–3800.
[45]

K. Zhao, W. S. Chu, F. de la Torre, J. F. Cohn, and H. Zhang, Joint patch and multi-label learning for facial action unit and holistic expression recognition, IEEE Trans. Image Process., vol. 25, no. 8, pp. 3931–3946, 2016.

[46]
Y. Song, D. McDuff, D. Vasisht, and A. Kapoor, Exploiting sparsity and co-occurrence structure for action unit recognition, in Proc. 2015 11th IEEE Int. Conf. and Workshops on Automatic Face and Gesture Recognition, Ljubljana, Slovenia, 2015, pp. 1–8.
[47]
Y. Li, X. Huang, and G. Zhao, Micro-expression action unit detection with spatio-temporal adaptive pooling, arXiv preprint arXiv:1907.05023, 2020.
[48]
W. Li, F. Abtahi, Z. Zhu, and L. Yin, EAC-Net: A region-based deep enhancing and cropping approach for facial action unit detection, in Proc. 2017 12th IEEE Int. Conf. Automatic Face & Gesture Recognition, Washington, DC, USA, 2017, pp. 103–110.
[49]
D. Kollias and S. Zafeiriou, Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface, in Proc. 30th British Machine Vision Conf. 2019, Cardiff, UK, 2019, p. 297.
[50]
Y. Li and G. Zhao, Intra- and inter-contrastive learning for micro-expression action unit detection, in Proc. 2021 Int. Conf. Multimodal Interaction, Montréal, Canada, 2021, pp. 702–706.
[51]

Y. Li, X. Huang, and G. Zhao, Micro-expression action unit detection with spatial and channel attention, Neurocomputing, vol. 436, pp. 221–231, 2021.

[52]
Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel, Gated graph sequence neural networks, in presented at 4th Int. Conf. Learning Representations, (ICLR), San Juan, Puerto Rico, 2016.
[53]
K. S. Tai, R. Socher, and C. D. Manning, Improved semantic representations from tree-structured long short-term memory networks, in Proc. 53rd Annu. Meeting of the Association for Computational Linguistics and the 7th Int. Joint Conf. Natural Language Processing, Beijing, China, 2015, pp. 1556–1566.
[54]
J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, Spectral networks and locally connected networks on graphs, in Proc. 2nd Int. Conf. Learning Representations, https://dblp.org/db/conf/iclr/iclr2014.html#BrunaZSL13, 2024.
[55]

L. Shi, Y. Zhang, J. Cheng, and H. Lu, Skeleton-based action recognition with multi-stream adaptive graph convolutional networks, IEEE Trans. Image Process., vol. 29, pp. 9532–9545, 2020.

[56]

X. Hao, J. Li, Y. Guo, T. Jiang, and M. Yu, Hypergraph neural network for skeleton-based action recognition, IEEE Trans. Image Process., vol. 30, pp. 2263–2275, 2021.

[57]
S. Yan, Y. Xiong, and D. Lin, Spatial temporal graph convolutional networks for skeleton-based action recognition, in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 7444–7452.
[58]
X. Li, Y. Yang, Q. Zhao, T. Shen, Z. Lin, and H. Liu, Spatial pyramid based graph reasoning for semantic segmentation, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8947–8956.
[59]
P. Lin, P. Sun, G. Cheng, S. Xie, X. Li, and J. Shi, Graph-guided architecture search for real-time semantic segmentation, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 4202–4211.
[60]
Z. M. Chen, X. S. Wei, P. Wang, and Y. Guo, Multi-label image recognition with graph convolutional networks, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 5172–5181.
[61]
J. Ye, J. He, X. Peng, W. Wu, and Y. Qiao, Attention-driven dynamic graph convolutional network for multi-label image recognition, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 649–665.
[62]
M. Henaff, J. Bruna, and Y. LeCun, Deep convolutional networks on graph-structured data, arXiv preprint arXiv: 1506.05163, 2015.
[63]
M. Niepert, M. Ahmed, and K. Kutzkov, Learning convolutional neural networks for graphs, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 2014–2023.
[64]

M. Verma, S. K. Vipparthi, G. Singh, and S. Murala, Learnet: Dynamic imaging network for micro expression recognition, IEEE Trans. Image Process., vol. 29, pp. 1618–1627, 2020.

[65]
S. T. Liong, Y. S. Gan, J. See, H. Q. Khor, and Y. C. Huang, Shallow triple stream three-dimensional CNN (STSTNet) for micro-expression recognition, in Proc. 2019 14th IEEE Int. Conf. Automatic Face & Gesture Recognition, Lille, France, 2019, pp. 1–5.
[66]
L. Zhou, Q. Mao, and L. Xue, Dual-inception network for cross-database micro-expression recognition, in Proc. 2019 14th IEEE Int. Conf. Automatic Face & Gesture Recognition, Lille, France, 2019, pp. 1–5.
[67]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 1–9.
[68]
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2999–3007.
[69]
J. Wu, L. Wang, L. Wang, J. Guo, and G. Wu, Learning actor relation graphs for group activity recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9956–9966.
[70]

B. Schuller, B. Vlasenko, F. Eyben, M. Wöllmer, A. Stuhlsatz, A. Wendemuth, and G. Rigoll, Cross-corpus acoustic emotion recognition: Variances and strategies, IEEE Trans. Affect. Comput., vol. 1, no. 2, pp. 119–131, 2010.

[71]
C. Zach, T. Pock, and H. Bischof, A duality based approach for realtime TV-L1 optical flow, in Proc. 29th Annual Symp. German Association for Pattern Recognition, Heidelberg, Germany, 2007, pp. 214–223.
[72]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[73]

J. Gou, L. Sun, B. Yu, S. Wan, W. Ou, and Z. Yi, Multilevel attention-based sample correlations for knowledge distillation, IEEE Trans. Ind. Inf., vol. 19, no. 5, pp. 7099–7109, 2023.

[74]

J. Gou, B. Yu, S. J. Maybank, and D. Tao, Knowledge distillation: A survey, Int. J. Comput. Vis., vol. 129, no. 6, pp. 1789–1819, 2021.

Tsinghua Science and Technology
Pages 2114-2132
Cite this article:
Zhou L, Mao Q, Dong M. Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation. Tsinghua Science and Technology, 2025, 30(5): 2114-2132. https://doi.org/10.26599/TST.2024.9010095

129

Views

5

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 15 November 2023
Revised: 22 April 2024
Accepted: 17 May 2024
Published: 29 April 2025
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return