Journal Home > Volume 27 , Issue 6

Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos. They are both active research topics in computer vision community, which have attracted considerable attention from academia and industry. They are also the precondition for intelligent interaction and human-computer cooperation, and they help the machine perceive the external environment. In the past decade, tremendous progress has been made in the field, especially after the emergence of deep learning technologies. Hence, it is necessary to make a comprehensive review of recent developments. In this paper, firstly, we attempt to present the background, and then discuss research progresses. Secondly, we introduce datasets, various typical feature representation methods, and explore advanced human action recognition and posture prediction algorithms. Finally, facing the challenges in the field, this paper puts forward the research focus, and introduces the importance of action recognition and posture prediction by taking interactive cognition in self-driving vehicle as an example.


menu
Abstract
Full text
Outline
About this article

A Survey of Human Action Recognition and Posture Prediction

Show Author's information Nan Ma( )Zhixuan WuYiu-ming CheungYuchen GuoYue GaoJiahong LiBeijyan Jiang
Beijing Key Laboratory of Information Service Engineering, the College of Robotics, Beijing Union University, Beijing 100101, China
Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
School of Software, Tsinghua University, Beijing 100084, China
College of Robotics, Beijing Union University, Beijing 100101, China

Abstract

Human action recognition and posture prediction aim to recognize and predict respectively the action and postures of persons in videos. They are both active research topics in computer vision community, which have attracted considerable attention from academia and industry. They are also the precondition for intelligent interaction and human-computer cooperation, and they help the machine perceive the external environment. In the past decade, tremendous progress has been made in the field, especially after the emergence of deep learning technologies. Hence, it is necessary to make a comprehensive review of recent developments. In this paper, firstly, we attempt to present the background, and then discuss research progresses. Secondly, we introduce datasets, various typical feature representation methods, and explore advanced human action recognition and posture prediction algorithms. Finally, facing the challenges in the field, this paper puts forward the research focus, and introduces the importance of action recognition and posture prediction by taking interactive cognition in self-driving vehicle as an example.

Keywords: computer vision, human action recognition, posture prediction, human-computer cooperation, interactive cognition

References(270)

[1]
D. Y. Li, N. Ma, and Y. Gao, Future vehicles: Learnable wheeled robots, Sci. China Inf. Sci., vol. 63, no. 9, p. 193201, 2020.
[2]
H. S. Fang, S. Q. Xie, Y. W. Tai, and C. W. Lu, RMPE: Regional multi-person pose estimation, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2353–2362.
[3]
H. P. Liu, Y. P. Wu, and F. C. Sun, Extreme trust region policy optimization for active object recognition, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2253–2258, 2018.
[4]
L. Chen, N. Ma, P. Wang, J. H. Li, P. F. Wang, G. L. Pang, and X. J. Shi, Survey of pedestrian action recognition techniques for autonomous driving, Tsinghua Science and Technology, vol. 25, no. 4, pp. 458–470, 2020.
[5]
X. Y. Zhang, C. S. Li, H. C. Shi, X. B. Zhu, P. Li, and J. Dong, AdapNet: Adaptability decomposing encoder-decoder network for weakly supervised action recognition and localization, IEEE Trans. Neural Netw. Learn. Syst., .
[6]
M. S. Li, S. H. Chen, X. Chen, Y. Zhang, Y. F. Wang, and Q. Tian, Symbiotic graph neural networks for 3D skeleton-based human action recognition and motion prediction, IEEE Trans. Pattern Anal. Mach. Intell., .
[7]
Y. Kong and Y. Fu, Human action recognition and prediction: A survey, arXiv preprint arXiv: 1806.11230, 2018.
[8]
N. Khalid, M. Gochoo, A. Jalal, and K. Kim, Modeling two-person segmentation and locomotion for stereoscopic action identification: A sustainable video surveillance system, Sustainability, vol. 13, no. 2, p. 970, 2021.
[9]
T. S. Kim and A. Reiter, Interpretable 3D human action analysis with temporal convolutional networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 2017, pp. 1623–1631.
[10]
Y. Q. Zhao, W. W. T. Fok, and C. W. Chan, Video-based violence detection by human action analysis with neural network, in Proc. SPIE 11321, 2019 Int. Conf. Image and Video Processing, and Artificial Intelligence, Shanghai, China, 2019, p. 113212N.
[11]
G. Li and C. Y. Li, Learning skeleton information for human action analysis using Kinect, Signal Process.: Image Commun., vol. 84, p. 115814, 2020.
[12]
M. Mahmood, A. Jalal, and K. Kim, WHITE STAG model: Wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors, Multimed. Tools Appl., vol. 79, no. 11, pp. 6919–6950, 2020.
[13]
L. Vianello, J. B. Mouret, E. Dalin, A. Aubry, and S. Ivaldi. Human posture prediction during physical human-robot interaction, IEEE Robot. Autom. Lett., vol. 6, no. 3, pp. 6046–6053, 2021.
[14]
Y. R. Bin, X. Cao, X. Y. Chen, Y. H. Ge, Y. Tai, C. J. Wang, J. L. Li, F. Y. Huang, C. X. Gao, and N. Sang, Adversarial semantic data augmentation for human pose estimation, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 606–622.
[15]
R. Schnürer, A. C. Öztireli, M. Heitzler, R. Sieber, and L. Hurni, Instance segmentation, body part parsing, and pose estimation of human figures in pictorial maps, Int. J. Cartogr., .
[16]
E. S. L. Ho, J. C. P. Chan, D. C. K. Chan, H. P. H. Shum, Y. M. Cheung, and P. C. Yuen, Improving posture classification accuracy for depth sensor-based human activity monitoring in smart environments, Comput. Vis. Image Underst., vol. 148, pp. 97–110, 2016.
[17]
A. Schuldt, I. Laptev, and B. Caputo, Recognizing human actions: A local SVM approach, in Proc. 17th Int. Conf. Pattern Recognition, Cambridge, UK, 2004, pp. 32–36.
[18]
D. Weinland, R. Ronfard, and E. Boyer, Free viewpoint action recognition using motion history volumes, Comput. Vis. Image Underst., vol. 104, nos. 2&3, pp. 249–257, 2006.
[19]
W. Choi, K. Shahid, and S. Savarese, What are they doing?: Collective activity classification using spatio-temporal relationship among people, in Proc. 12th Int. Conf. Computer Vision Workshops, ICCV Workshops, Kyoto, Japan, 2009, pp. 1282–1289.
[20]
M. Marszalek, I. Laptev, and C. Schmid, Actions in context, in Proc. 2009 IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 2929–2936.
[21]
S. Singh, S. A. Velastin, and H. Ragheb, MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods, in Proc. 7th IEEE Int. Conf. Advanced Video and Signal Based Surveillance, Boston, MA, USA, 2010, pp. 48–55.
[22]
M. S. Ryoo and J. K. Aggarwal, UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA), in Proc. IEEE Int. Conf. Pattern Recognition Workshops, Zurich, Switzerland, 2010.
[23]
Y. G. Jiang, G. N. Ye, S. F. Chang, D. Ellis, and A. A. Loui, Consumer video understanding: A benchmark database and an evaluation of human and machine performance, in Proc. 1st ACM Int. Conf. Multimedia Retrieval, Trento, Italy, 2011, pp. 1–8.
[24]
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, HMDB: A large video database for human motion recognition, in Proc. 2011 Int. Conf. Computer Vision, Barcelona, Spain, 2011, pp. 2556–2563.
[25]
K. Soomro, A. R. Zamir, and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, arXiv preprint arXiv: 1212.0402, 2012.
[26]
L. Xia, C. C. Chen, and J. K. Aggarwal, View invariant human action recognition using histograms of 3D joints, in Proc. 2012 IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 20–27.
[27]
H. S. Koppula, R. Gupta, and A. Saxena, Learning human activities and object affordances from RGB-D videos, Int. J. Robot. Res., vol. 32, no. 8, pp. 951–970, 2013.
[28]
H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, Towards understanding action recognition, in Proc. 2013 IEEE Int. Conf. Computer Vision, Sydney, NSW, Australia, 2013, pp. 3192–3199.
[29]
L. Seidenari, V. Varano, S. Berretti, A. Del Bimbo, and P. Pala, Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition Workshops, Portland, OR, USA, 2013, pp. 479–485.
[30]
W. Y. Zhang, M. L. Zhu, and K. G. Derpanis, From actemes to action: A strongly-supervised representation for detailed action understanding, in Proc. 2013 IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 2248–2255.
[31]
C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1325–1339, 2014.
[32]
M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, 2D human pose estimation: New benchmark and state of the art analysis, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 3686–3693.
[33]
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, Microsoft COCO: Common objects in context, in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[34]
F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles, ActivityNet: A large-scale video benchmark for human activity understanding, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 961–970.
[35]
J. F. Hu, W. S. Zheng, J. H. Lai, and J. G. Zhang, Jointly learning heterogeneous features for RGB-D activity recognition, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 5344–5352.
[36]
S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, YouTube-8M: A large-scale video classification benchmark, arXiv preprint arXiv: 1609.08675, 2016.
[37]
A. Shahroudy, J. Liu, T. T. Ng, and G. Wang, NTU RGB+D: A large scale dataset for 3D human activity analysis, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 1010–1019.
[38]
Sigurdsson G. A., Varol G., Wang X. L., Farhadi A., Laptev I., and Gupta A., Hollywood in homes: Crowdsourcing data collection for activity understanding, in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 510526.10.1007/978-3-319-46448-0_31
[39]
D. Mehta, H. Rhodin, D. Casas, P. Fua, O. Sotnychenko, W. P. Xu, and C. Theobalt, Monocular 3D human pose estimation in the wild using improved CNN supervision, in Proc. 2017 Int. Conf. 3D Vision (3DV), Qingdao, China, 2017, pp. 506–516.
[40]
A. Rasouli, I. Kotseruba, and J. K. Tsotsos, Agreeing to cross: How drivers and pedestrians communicate, in Proc. 2017 IEEE Intelligent Vehicles Symp. (IV), Los Angeles, CA, USA, 2017, pp. 264–269.
[41]
C. H. Liu, Y. Y. Hu, Y. H. Li, S. J. Song, and J. Y. Liu, PKU-MMD: A large scale benchmark for continuous multi-modal human action understanding, arXiv preprint arXiv: 1703.07475, 2017.
[42]
M. Trumble, A. Gilbert, C. Malleson, A. Hilton, and J. Collomosse, Total capture: 3D human pose estimation fusing video and inertial sensors, in Proc. British Machine Vision Conf., 2017, vol. 2, no. 5, pp. 1–13.
[43]
J. Carreira, E. Noland, A. Banki-Horvath, C. Hillier, and A. Zisserman. A short note about kinetics-600, arXiv preprint arXiv: 1808.01340, 2018.
[44]
C. H. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Q. Li, S. Vijayanarasimhan, G. Toderici, S. Ricco, R. Sukthankar, et al., AVA: A video dataset of spatio-temporally localized atomic visual actions, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6047–6056.
[45]
W. Kim, M. S. Ramanagopal, C. Barto, M. Y. Yu, K. Rosaen, N. Goumas, R. Vasudevan, and M. Johnson-Roberson, PedX: Benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections, IEEE Robot. Autom. Lett., vol. 4, no. 2, pp. 1940–1947, 2019.
[46]
M. Monfort, A. Andonian, B. L. Zhou, K. Ramakrishnan, S. A. Bargal, T. Yan, L. Brown, Q. F. Fan, D. Gutfreund, C. Vondrick, et al., Moments in time dataset: One million videos for event understanding, IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 502–508, 2020.
[47]
J. Monfort, A. Andonian, B. L. Zhou, K. Ramakrishnan, S. A. Bargal, T. Yan, L. Brown, Q. F. Fan, D. Gutfreund, C. Vondrick, et al., Moments in time dataset: One million videos for event understanding, IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 502–508, 2020.
[48]
J. Liu, A. Shahroudy, M. Perez, G. Wang, L. Y. Duan, and A. C. Kot, NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding, IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 10, pp. 2684–2701, 2020.
[49]
D. Shao, Y. Zhao, B. Dai, and D. H. Lin, Intra-and inter-action understanding via temporal action parsing, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 727–736.
[50]
D. Shao, Y. Zhao, B. Dai, and D. H. Lin, FineGym: A hierarchical video dataset for fine-grained action understanding, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2613–2622.
[51]
C. Y. Ding, K. Liu, F. Cheng, and E. Belyaev, Spatio-temporal attention on manifold space for 3D human action recognition, Appl. Intell., vol. 51, no. 1, pp. 560–570, 2021.
[52]
H. B. Zhang, Y. X. Zhang, B. N. Zhong, Q. Lei, L. J. Yang, J. X. Du, and D. S. Chen, A comprehensive survey of vision-based human action recognition methods, Sensors, vol. 19, no. 5, p. 1005, 2019.
[53]
M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa, Approaches for global-based action representations for games and action understanding, in Proc. 2011 IEEE Int. Conf. Automatic Face & Gesture Recognition, Santa Barbara, CA, USA, 2011, pp. 753–758.
[54]
Y. Zhu, J. K. Zhao, Y. N. Wang, and B. B. Zheng, A review of human action recognition based on deep learning, (in Chinese), Acta Autom. Sin., vol. 42, no. 6, pp. 848–857, 2016.
[55]
M. Singh, A. Basu, and M. K. Mandal, Human activity recognition based on silhouette directionality, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 9, pp. 1280–1292, 2008.
[56]
J. F. Jiang and S. S. Tian, Human detection based on background subtraction and closed contour fitting, (in Chinese), Comput. Eng. Appl., vol. 51, no. 14, pp. 198–202, 2015.
[57]
E. K. N. Asumang, X. Zuo, S. Zheng, and H. L. Yu, Human pose estimation based on evidence supporting and sub-graph pruning, in Proc. 32nd Youth Academic Ann. Conf. Chinese Association of Automation (YAC), Hefei, China, 2017, pp. 20–27.
[58]
A. Abdelbaky and S. Aly, Human action recognition using short-time motion energy template images and PCANet features, Neural Comput. Appl., vol. 32, no. 16, pp. 12561–12574, 2020.
[59]
N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proc. 2005 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 2005, pp. 886–893.
[60]
M. Zhang and A. A. Sawchuk, USC-HAD: A daily activity dataset for ubiquitous activity recognition using wearable sensors, in Proc. 2012 ACM Conf. Ubiquitous Computing, Pittsburgh, PA, USA, 2012, pp. 1036–1043.
[61]
C. Peng, H. Z. Huang, A. C. Tsoi, S. L. Lo, Y. Liu, and Z. Y. Yang. Motion boundary emphasised optical flow method for human action recognition, IET Comput. Vis., vol. 14, no. 6, pp. 378–390, 2020.
[62]
S. Ali and M. Shah, Human action recognition in videos using kinematic features and multiple instance learning, IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 288–303, 2010.
[63]
T. W. Lu, S. H. Ai, Y. Y. Jiang, Y. D. Xiong and F. Min, Deep optical flow feature fusion based on 3D convolutional networks for video action recognition, in Proc. 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 2018, pp. 1077–1080.
[64]
A. Ullah, K. Muhammad, J. Del Ser, S. W. Baik, and V. H. C. de Albuquerque, Activity recognition using temporal optical flow convolutional features and multilayer LSTM, IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9692–9702, 2019.
[65]
E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox. FlowNet 2.0: Evolution of optical flow estimation with deep networks, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 1647–1655.
[66]
H. A. Rashwan, M. A. Garcia, S. Abdulwahab, and D. Puig, Action representation and recognition through temporal co-occurrence of flow fields and convolutional neural networks, Multimed. Tools Appl., vol. 79, no. 45, pp. 34141–34158, 2020.
[67]
Y. Zhu, X. Y. Li, C. H. Liu, M. Zolfaghari, Y. J. Xiong, C. R. Wu, Z. Zhang, J. Tighe, R. Manmatha, and M. Li, A comprehensive study of deep video action recognition, arXiv preprint arXiv: 2012.06567, 2020.
[68]
I. Laptev, On space-time interest points, Int. J. Comput. Vis., vol. 64, no. 2, pp. 107–123, 2005.
[69]
X. Liu, Y. M. Cheung, M. Li, and H. L. Liu, A lip contour extraction method using localized active contour model with automatic parameter selection, in Proc. 20th Int. Conf. Pattern Recognition, Istanbul, Turkey, 2010, pp. 4332–4335.
[70]
L. C. Zhu and Y. Yang, ActBERT: Learning global-local video-text representations, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 8743–8752.
[71]
C. I. Patel, D. Labana, S. Pandya, K. Modi, H. Ghayvat, and M. Awais, Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences, Sensors, vol. 20, no. 24, p. 7299, 2020.
[72]
Z. X. Zheng, G. Y. An, D. P. Wu, and Q. Q. Ruan, Global and local knowledge-aware attention network for action recognition, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 334–347, 2021.
[73]
M. T. Gopalakrishna, M. Ravishankar, and D. R. Rameshbabu, Multiple moving object recognitions in video based on log Gabor-PCA approach, in Recent Advances in Intelligent Informatics, S. M. Thampi, A. Abraham, S. K. Pal, and J. M. C. Rodriguez, eds. Champaign, IL, USA: Springer, 2014, pp. 93–100.
DOI
[74]
J. Paul, W. Stechele, M. Kröhnert, and T. Asfour, Resource-aware programming for robotic vision, arXiv preprint arXiv: 1405.2908, 2014.
[75]
H. Vaghela, M. Oza, and S. Bagul, MREAK: Morphological retina keypoint descriptor, in Proc. 2019 Int. Conf. Artificial Intelligence and Information Technology (ICAIIT), Yogyakarta, Indonesia, 2019, pp. 10–15.
[76]
A. J. Piergiovanni and M. S. Ryoo, Representation flow for action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 9937–9945.
[77]
S. Sadhukhan, S. Mallick, P. K. Singh, R. Sarkar, and D. Bhattacharjee, A comparative study of different feature descriptors for video-based human action recognition, in Intelligent Computing: Image Processing Based Applications, J. K. Mandal and S. Banerjee, eds. Singapore: Springer, 2020, pp. 35–52.
DOI
[78]
H. Zhao, J. W. Dang, S. Wang, Y. P. Wang, and D. C. Gao, Dense trajectory action recognition algorithm based on improved SURF, IOP Conf. Ser.: Earth Environ. Sci., vol. 252, no. 3, p. 032179, 2019.
[79]
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, D2-Net: A trainable CNN for joint description and detection of local features, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 8084–8093.
[80]
I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, Learning realistic human actions from movies, in Proc. 2008 IEEE Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008, pp. 1–8.
[81]
S. R. Mishra, K. D. Krishna, G. Sanyal, and A. Sarkar, A feature weighting technique on SVM for human action recognition, J. Sci. Ind. Res., vol. 79, no. 7, pp. 626–630, 2020.
[82]
V. Bloom, D. Makris, and V. Argyriou, Clustered spatio-temporal manifolds for online action recognition, in Proc. 22nd Int. Conf. Pattern Recognition, Stockholm, Sweden, 2014, pp. 3963–3968.
[83]
M. A. R. Ahad, J. K. Tan, H. Kim, and S. Ishikawa, Motion history image: Its variants and applications, Mach. Vis. Appl., vol. 23, no. 2, pp. 255–281, 2012.
[84]
A. F. Bobick and J. W. Davis, The recognition of human movement using temporal templates, IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 3, pp. 257–267, 2001.
[85]
A. Bobick and J. Davis, Real-time recognition of activity using temporal templates, in Proc. 3rd IEEE Workshop on Applications of Computer Vision. WACV’96, Sarasota, FL, USA, 1996, pp. 39–42.
[86]
S. Zernetsch, V. Kress, B. Sick, and K. Doll, Early start intention detection of cyclists using motion history images and a deep residual network, in Proc. 2018 IEEE Intelligent Vehicles Symp. (IV), Changshu, China, 2018, pp. 1–6.
[87]
T. Vajda, Action recognition based on fast dynamic-time warping method, in Proc. 5th Int. Conf. Intelligent Computer Communication and Processing, Cluj-Napoca, Romania, 2009, pp. 127–131.
[88]
C. Y. Chang, D. A. Huang, Y. N. Sui, L. Fei-Fei, and J. C. Niebles, D3TW: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 3541–3550.
[89]
X. Yang, D. J. D. Liu, J. Liu, F. R. Yan, P. P. Chen, and Q. Niu, Follower: A novel self-deployable action recognition framework, Sensors, vol. 21, no. 3, p. 950, 2021.
[90]
X. Z. Wang and S. X. Lu, Improved fuzzy multicategory support vector machines classifier, in Proc. 2006 Int. Conf. Machine Learning and Cybernetics, Dalian, China, 2006, pp. 3585–3589.
[91]
V. Parameswari and S. Pushpalatha, Human activity recognition using SVM and deep learning, Int. European Journal of Molecular & Clinical Medicine., vol. 7, no. 4, pp. 1984–1990, 2020.
[92]
P. Hristov, A. Manolova, and O. Boumbarov, Deep learning and SVM-based method for human activity recognition with skeleton data, in Proc. 28th National Conf. Int. Participation (TELECOM), Sofia, Bulgaria, 2020, pp. 49–52.
[93]
K. Li, Human action recognition based on fuzzy support vector machines, in Proc. 5th Int. Symp. Computational Intelligence and Design, Hangzhou, China, 2012, pp. 45–48.
[94]
G. Uslu and S. Baydere, Support Vector Machine based activity detection, in Proc. 21st Signal Processing and Communications Applications Conf. (SIU), Haspolat, Turkey, 2013, pp. 1–4.
[95]
H. G. Wang, Z. J. Song, W. Q. Li, and P. C. Wang. A hybrid network for large-scale action recognition from RGB and depth modalities, Sensors, vol. 20, no. 11, p. 3305, 2020.
[96]
J. Q. Zhou and M. Zhi, A human action recognition method based on MHI and support vector machine, (in Chinese), Softw. Guide, vol. 16, no. 2, pp. 36–38, 2017.
[97]
L. Chen and H. C. Lu, A new object recognition method based on ML-pLSA model, (in Chinese), J. Electron. Inf. Technol., vol. 33, no. 12, pp. 2909–2915, 2011.
[98]
L. Z. Tan, L. M. Xia, J. X. Huang, and S. P. Xia, Human action recognition based on pLSA model, (in Chinese), J. Natl. Univ. Def. Technol., vol. 35, no. 5, pp. 102–108, 2013.
[99]
T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh, Activity recognition and abnormality detection with the switching hidden semi-Markov model, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 2005, pp. 838–845.
[100]
C. Sminchisescu, A. Kanaujia, and D. Metaxas, Conditional models for contextual human motion recognition, Comput. Vis. Image Underst., vol. 104, nos. 2&3, pp. 210–220, 2006.
[101]
J. W. Xu and Q. Luo, Human action recognition based on mixed Gaussian hidden Markov model, MATEC Web Conf., vol. 336, p. 06004, 2021.
[102]
L. Zhao, L. Guo, J. S. Xie, and H. Liu, Video abnormal target description based on CRF model, in Proc. 2012 Int. Conf. Audio, Language and Image Processing, Shanghai, China, 2012, pp. 519–524.
[103]
K. Liu, L. Gao, N. M. Khan, L. Qi, and L. Guan. A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition, IEEE Trans. Multimed., vol. 23, pp. 64–76, 2020.
[104]
T. L. Liu, X. D. Dong, Y. Z. Wang, X. B. Dai, Q. Z. You, and J. B. Luo, Double-layer conditional random fields model for human action recognition, Signal Process.: Image Commun., vol. 80, p. 115672, 2020.
[105]
J. Yamato, J. Ohya, and K. Ishii, Recognizing human action in time-sequential images using hidden Markov model, in Proc. 1992 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, Champaign, IL, USA, 1992, pp. 379–385.
[106]
P. Zhang, M. Ito, S. I. Ito, and M. Fukumi, Implementation of EOG mouse using Learning Vector Quantization and EOG-feature based methods, in Proc. 2013 IEEE Conf. Systems, Process & Control (ICSPC), Kuala Lumpur, Malaysia, 2013, pp. 88–92.
[107]
H. Liu, L. Guo, B. Yi, and G. Z. Wang, Human activity recognition based on 3D skeletons and MCRF model, (in Chinese), J. Univ. Sci. Technol. China, vol. 44, no. 4, pp. 285–291, 2014.
[108]
R. Chereshnev and A. Kertész-Farkas, RapidHARe: A computationally inexpensive method for real-time human activity recognition from wearable sensors, J. Ambient Intell. Smart Environ., vol. 10, no. 5, pp. 377–391, 2018.
[109]
S. Ali and N. Bouguila, Multimodal action recognition using variational-based Beta-Liouville hidden Markov models, IET Image Process., vol. 14, no. 17, pp. 4785–4794, 2020.
[110]
B. F. Shi, Q. Dai, Y. D. Mu, and J. D. Wang, Weakly-supervised action localization by generative attention modeling, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 1006–1016.
[111]
H. H. Chen, B. B. Jiang, and X. Yao, Semisupervised negative correlation learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 11, pp. 5366–5379, 2018.
[112]
S. Y. Shin, S. Lee, I. D. Yun, S. M. Kim, and K. M. Lee, Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images, IEEE Trans. Med. Imaging, vol. 38, no. 3, pp. 762–774, 2019.
[113]
C. Tang, W. J. Wang, X. F. Wang, C. Zhang, and L. Zou, Human action recognition based on multi-view semi-supervised learning, (in Chinese), Pattern Recognit. Artif. Intell., vol. 32, no. 4, pp. 376–384, 2019.
[114]
G. Pikramenos, E. Mathe, E. Vali, I. Vernikos, A. Papadakis, E. Spyrou, and P. Mylonas, An adversarial semi-supervised approach for action recognition from pose information, Neural Comput. Appl., vol. 32, no. 23, pp. 17181–17195, 2020.
[115]
C. Chen, R. Jafari, and N. Kehtarnavaz, Improving human action recognition using fusion of depth camera and inertial sensors, IEEE Trans. Hum.-Mach. Syst., vol. 45, no. 1, pp. 51–61, 2015.
[116]
L. T. Law and Y. M. Cheung, Color image segmentation using rival penalized controlled competitive learning, in Proc. Int. Joint Conf. Neural Networks, Portland, OR, USA, 2003, pp. 108–112.
[117]
C. Chen, R. Jafari, and N. Kehtarnavaz, A real-time human action recognition system using depth and inertial sensor fusion, IEEE Sens. J., vol. 16, no. 3, pp. 773–781, 2016.
[118]
N. Dawar and N. Kehtarnavaz, Action detection and recognition in continuous action streams by deep learning-based sensing fusion, IEEE Sens. J., vol. 18, no. 23, pp. 9660–9668, 2018.
[119]
J. N. Lei, X. F. Ren, and D. Fox, Fine-grained kitchen activity recognition using RGB-D, in Proc. 2012 ACM Conf. Ubiquitous Computing, Pittsburgh, PA, USA, 2012, pp. 208–211.
[120]
J. Ranjan, Y. Yao, E. Griffiths, and K. Whitehouse, Using mid-range RFID for location based activity recognition, in Proc. 2012 ACM Conf. Ubiquitous Computing, Pittsburgh, PA, USA, 2012, pp. 647–648.
[121]
M. O. Killijian, M. Roy, G. Trédan, and C. Zanon, SOUK: Social observation of human kinetics, in Proc. 2013 ACM Int. Joint Conf. Pervasive and Ubiquitous Computing, Zurich, Switzerland, 2013, pp. 193–196.
[122]
G. M. Jeong, P. H. Truong, and S. I. Choi, Classification of three types of walking activities regarding stairs using plantar pressure sensors, IEEE Sens. J., vol. 17, no. 9, pp. 2638–2639, 2017.
[123]
M. Koohzadi and N. M. Charkari, Survey on deep learning methods in human action recognition, IET Comput. Vis., vol. 11, no. 8, pp. 623–632, 2017.
[124]
Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng, Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis, in Proc. CVPR 2011, Colorado Springs, CO, USA, 2011, pp. 3361–3368.
[125]
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, Large-scale video classification with convolutional neural networks, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1725–1732.
[126]
K. Tong, Y. Q. Wu, and F. Zhou, Recent advances in small object detection based on deep learning: A review, Image Vis. Comput., vol. 97, p. 103910, 2020.
[127]
W. G. Wang, Q. X. Lai, H. Z. Fu, J. B. Shen, H. B. Ling, and R. G. Yang, Salient object detection in the deep learning era: An in-depth survey, IEEE Trans. Pattern Anal. Mach. Intell., .
[128]
K. Simonyan and A. Zisserman, Two-stream convolutional networks for action recognition in videos, arXiv preprint arXiv: 1406.2199, 2014.
[129]
Z. W. Ding, P. C. Wang, P. O. Ogunbona, and W. Q. Li, Investigation of different skeleton features for CNN-based 3D action recognition, in Proc. 2017 IEEE Int. Conf. Multimedia & Expo Workshops (ICMEW), Hong Kong, China, 2017, pp. 617–622.
[130]
T. Huynh-The and D. S. Kim, Data augmentation for CNN-based 3D action recognition on small-scale datasets, in Proc. 17th Int. Conf. Industrial Informatics (INDIN), Helsinki, Finland, 2019, pp. 239–244.
[131]
S. Li, Z. C. Zhao, and F. Su, A spatio-temporal hybrid network for action recognition, in Proc. 2019 IEEE Visual Communications and Image Processing (VCIP), Sydney, Australia, 2019, pp. 1–4.
[132]
C. Y. Yang, Y. H. Xu, J. P. Shi, B. Dai, and B. L. Zhou, Temporal pyramid network for action recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 588–597.
[133]
G. H. Jiang, X. Y. Jiang, Z. J. Fang, and S. S. Chen, An efficient attention module for 3d convolutional neural networks in action recognition, Appl. Intell., vol. 51, no. 10, pp. 7043–7057, 2021.
[134]
S. Kumawat, M. Verma, Y. Nakashima, and S. Raman, Depthwise spatio-temporal STFT convolutional neural networks for human action recognition, IEEE Trans. Pattern Anal. Mach. Intell., .
[135]
C. C. Liu, J. Ying, H. M. Yang, X. Hu, and J. Liu, Improved human action recognition approach based on two-stream convolutional neural network model, Vis. Comput., vol. 37, no. 6, pp. 1327–1341, 2021.
[136]
Z. F. Zhang, Z. M. Lv, C. Q. Gan, and Q. Y. Zhu, Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions, Neurocomputing, vol. 410, pp. 304–316, 2020.
[137]
M. Majd and R. Safabakhsh, Correlational convolutional LSTM for human action recognition, Neurocomputing, vol. 396, pp. 224–229, 2020.
[138]
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko, Long-term recurrent convolutional networks for visual recognition and description, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 2625–2634.
[139]
J. Y. H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, Beyond short snippets: Deep networks for video classification, in Proc. 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp. 4694–4702.
[140]
W. B. Li, L. Y. Wen, M. C. Chang, S. N. Lim, and S. W. Lyu, Adaptive RNN tree for large-scale human action recognition, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 1453–1461.
[141]
J. Liu, G. Wang, L. Y. Duan, K. Abdiyeva, and A. C. Kot, Skeleton-based human action recognition with global context-aware attention LSTM networks, IEEE Trans. Image Process., vol. 27, no. 4, pp. 1586–1599, 2018.
[142]
J. W. Ji, R. Krishna, L. Fei-Fei, and J. C. Niebles, Action genome: Actions as compositions of spatio-temporal scene graphs, in Proc. 2020 IEEE/CVF Conf. ComputerVision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10233–10244.
[143]
A. Ullah, K. Muhammad, T. Hussain, and S. W. Baik, Conflux LSTMs network: A novel approach for multi-view action recognition, Neurocomputing, vol. 435, pp. 321–329, 2021.
[144]
M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi, Human action recognition using fusion of multiview and deep features: An application to video surveillance, Multimed. Tools Appl., .
[145]
J. M. Llaurado-Fons, A. Martinez, F. A. Pujol-López, and H. Mora, An architecture for human action recognition in smart cities video surveillance systems, in Proc. Int. Research and Innovation Forum 2020: Disruptive Technologies in Times of Change. Champaign, IL, USA: Springer International Publishing, 2021, pp. 51–56.
DOI
[146]
Ma N., Gao Y., Li J. H., and Li D. Y., Interactive cognition in self-driving, (in Chinese), Sci. Sin. Inform., vol. 48, no. 8, pp. 10831096, 2018.10.1360/N112018-00028
[147]
U. Wang, H. X. Wu, J. J. Zhang, Z. F. Gao, J. M. Wang, P. S. Yu, and M. S. Long. PredRNN: A recurrent neural network for spatiotemporal predictive learning, arXiv preprint arXiv: 2103.09504, 2021.
[148]
T. Z. Zhang, S. Liu, C. S. Xu, and H. Q. Lu, Boosted multi-class semi-supervised learning for human action recognition, Pattern Recognit., vol. 44, nos. 10&11, pp. 2334–2342, 2011.
[149]
Y. Y. Wang and B. Wang, The conditional random fields method for human action recognition, (in Chinese), J. Chongqing Univ. Technol. (Nat. Sci.), vol. 27, no. 6, pp. 93–99&105, 2013.
[150]
S. Wang, Z. G. Ma, Y. Yang, X. Li, C. Y. Pang, and A. G. Hauptmann, Semi-supervised multiple feature analysis for action recognition, IEEE Trans. Multimed., vol. 16, no. 2, pp. 289–298, 2014.
[151]
S. Al-Obaidi and C. Abhayaratne, Temporal salience based human action recognition, in Proc. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 2017–2021.
[152]
N. Almaadeed, O. Elharrouss, S. Al-Maadeed, A. Bouridane, and A. Beghdadi, A novel approach for robust multi human action recognition and summarization based on 3D convolutional neural networks, arXiv preprint arXiv: 1907.11272, 2019.
[153]
S. H. S. Basha, V. Pulabaigari, and S. Mukherjee, An information-rich sampling technique over spatio-temporal CNN for classification of human actions in videos, arXiv preprint arXiv: 2002.02100, 2020.
[154]
Z. X. Wu, X. Wang, Y. G. Jiang, H. Ye, and X. Y. Xue, Modeling spatial-temporal clues in a hybrid deep learning framework for video classification, in Proc. 23rd ACM Int. Conf. Multimedia, Brisbane, Australia, 2015, pp. 461–470.
[155]
T. H. Yeh, C. Kuo, A. S. Liu, Y. H. Liu, Y. H. Yang, Z. J. Li, J. T. Shen, and L. C. Fu, ResFlow: Multi-tasking of sequentially pooling spatiotemporal features for action recognition and optical flow estimation, in Proc. 2019 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 2835–2840.
[156]
Z. Shou, X. D. Lin, Y. Kalantidis, L. Sevilla-Lara, M. Rohrbach, S. F. Chang, and Z. C. Yan, DMC-Net: Generating discriminative motion cues for fast compressed video action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1268–1277.
[157]
Y. Y. Zhang, K. R. Hao, X. S. Tang, B. Wei, and L. H. Ren, Long-term 3D convolutional fusion network for action recognition, in Proc. 2019 IEEE Int. Conf. Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 2019, pp. 216–220.
[158]
H. Alwassel, D. Mahajan, B. Korbar, L. Torresani, B. Ghanem, and D. Tran, Self-supervised learning by cross-modal audio-video clustering, arXiv preprint arXiv: 1911.12667, 2020.
[159]
R. Vemulapalli, F. Arrate, and R. Chellappa, Human action recognition by representing 3D skeletons as points in a lie group, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 588–595.
[160]
M. S. Li, S. H. Chen, X. Chen, Y. Zhang, Y. F. Wang, and Q. Tian, Actional-structural graph convolutional networks for skeleton-based action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 3590–3598.
[161]
C. Y. Si, W. T. Chen, W. Wang, L. Wang, and T. N. Tan, An attention enhanced graph convolutional LSTM network for skeleton-based action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1227–1236.
[162]
K. Cheng, Y. F. Zhang, X. Y. He, W. H. Chen, J. Cheng, and H. Q. Lu, Skeleton-based action recognition with shift graph convolutional network, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 180–189.
[163]
Y. X. Chen, Z. Q. Zhang, C. F. Yuan, B. Li, Y. Deng, and W. M. Hu, Channel-wise topology refinement graph convolution for skeleton-based action recognition, arXiv preprint arXiv: 2107.12213, 2021.
[164]
H. D. Duan, Y. Zhao, K. Chen, D. Shao, D. H. Lin, and B. Dai, Revisiting skeleton-based action recognition, arXiv preprint arXiv: 2104.13586, 2021.
[165]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, Learning spatiotemporal features with 3D convolutional networks, in Proc. 2015 IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 4489–4497.
[166]
B. Y. Jiang, M. M. Wang, W. H. Gan, W. Wu, and J. J. Yan, STM: Spatiotemporal and motion encoding for action recognition, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 2000–2009.
[167]
Y. Li, B. Ji, X. T. Shi, J. G. Zhang, B. Kang, and L. M. Wang, TEA: Temporal excitation and aggregation for action recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 906–915.
[168]
H. D. Duan, Y. Zhao, Y. J. Xiong, W. T. Liu, and D. H. Lin, Omni-sourced webly-supervised learning for video recognition, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 670–688.
[169]
S. N. Gowda, M. Rohrbach, and L. Sevilla-Lara, SMART frame selection for action recognition, arXiv preprint arXiv: 2012.10671, 2020.
[170]
C. Li, Q. Y. Zhong, D. Xie, and S. L. Pu, Collaborative spatiotemporal feature learning for video action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7864–7873.
[171]
S. Z. Chen and D. Huang, Elaborative rehearsal for zero-shot action recognition, arXiv preprint arXiv: 2108.02833, 2021.
[172]
J. Munro and D. Damen, Multi-modal domain adaptation for fine-grained action recognition, in Proc. 2020 IEEE/ CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 119–129.
[173]
Q. Q. Xiong, J. J. Zhang, P. Wang, D. D. Liu, and R. X. Gao, Transferable two-stream convolutional neural network for human action recognition, J. Manuf. Syst., vol. 56, pp. 605–614, 2020.
[174]
A. Abdelbaky and S. Aly, Two-stream spatiotemporal feature fusion for human action recognition, Vis. Comput., vol. 37, no. 7, pp. 1821–1835, 2021.
[175]
X. T. Yang, X. D. Yang, M. Y. Liu, F. Y. Xiao, L. S. Davis, and J. Kautz, STEP: Spatio-temporal progressive learning for video action detection, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 264–272.
[176]
Y. X. Li, W. Y. Lin, J. See, N. Xu, S. G. Xu, K. Yan, and C. Yang, CFAD: Coarse-to-fine action detector for spatiotemporal action localization, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 510–527.
[177]
M. Korban and X. Li, DDGCN: A dynamic directed graph convolutional network for action recognition, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 761–776.
[178]
W. Peng, J. G. Shi, T. Varanka, and G. Y. Zhao, Rethinking the ST-GCNs for 3D skeleton-based human action recognition, Neurocomputing, vol. 454, pp. 45–53, 2021.
[179]
W. Peng, J. G. Shi, and G. Y. Zhao, Spatial temporal graph deconvolutional network for skeleton-based human action recognition, IEEE Signal Process. Lett., vol. 28, pp. 244–248, 2021.
[180]
J. Li, P. Lei, and S. Todorovic, Weakly supervised energy-based learning for action segmentation, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision (ICCV), Seoul, Republic of Korea, 2019, pp. 6242–6250.
[181]
T. F. Zhou, W. G. Wang, S. Y. Qi, H. B. Ling, and J. B. Shen, Cascaded human-object interaction recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 4262–4271.
[182]
J. J. Tang, J. Xia, X. Z. Mu, B. Pang, and C. W. Lu, Asynchronous interaction aggregation for action detection, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 71–87.
[183]
H. H. Pham, L. Khoudour, A. Crouzil, P. Zegers, and S. A. Velastin, Exploiting deep residual networks for human action recognition from skeletal data, Comput. Vis. Image Underst., vol. 170, pp. 51–66, 2018.
[184]
P. F. Zhang, C. L. Lan, J. L. Xing, W. J. Zeng, J. R. Xue, and N. N. Zheng, View adaptive neural networks for high performance skeleton-based human action recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1963–1978, 2019.
[185]
W. Peng, X. P. Hong, H. Y. Chen, and G. Y. Zhao, Learning graph convolutional network for skeleton-based human action recognition by neural searching, in Proc. the 34th AAAI Conf. Artificial Intelligence, 2020, vol. 34, no. 3, pp. 2669–2676.
[186]
Z. Y. Liu, H. W. Zhang, Z. H. Chen, Z. Y. Wang, and W. L. Ouyang, Disentangling and unifying graph convolutions for skeleton-based action recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 140–149.
[187]
M. Fayyaz and J. Gall, SCT: Set constrained temporal transformer for set supervised action segmentation, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 498–507.
[188]
J. Q. Dong, Y. B. Gao, H. J. Lee, H. Zhou, Y. F. Yao, Z. J. Fang, and B. Huang, Action recognition based on the fusion of graph convolutional networks with high order features, Appl. Sci., vol. 10, no. 4, p. 1482, 2020.
[189]
J. M. Zhou, K. Y. Lin, H. X. Li, and W. S. Zheng, Graph-based high-order relation modeling for long-term action recognition, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 8980–8989.
[190]
S. Sudhakaran, S. Escalera, and O. Lanz, Gate-shift networks for video action recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 1099–1108.
[191]
Q. H. Ke, M. Bennamoun, H. Rahmani, S. An, F. Sohel, and F. Boussaid, Learning latent global network for skeleton-based action prediction, IEEE Trans. Image Process., vol. 29, pp. 959–970, 2019.
[192]
J. Liu, A. Shahroudy, G. Wang, L. Y. Duan, and A. C. Kot, Skeleton-based online action prediction using scale selection network, IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 6, pp. 1453–1467, 2020.
[193]
B. Rout, R. R. Dash, and D. Dhupal, Posture prediction and optimization for a manual assembly operation involving lifting of weights, Int. J. Simul. Multidisci. Des. Optim., vol. 11, p. 1, 2020.
[194]
H. Y. Luan, Y. Xiong, J. L. Zhou, and T. P. Qian, From DeepNet to HRNet, here is a full guide to in-depth learning “human posture estimation”, blog.nanonets, http://blog.itpub.net/31562039/viewspace-2643565/, 2019.
[195]
G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, Towards accurate multi-person pose estimation in the wild, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 3711–3719.
[196]
S. Kreiss, L. Bertoni, and A. Alahi, PifPaf: Composite fields for human pose estimation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 11969–11978.
[197]
S. L. Huang, M. M. Gong, and D. C. Tao, A coarse-fine network for keypoint localization, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 3047–3056.
[198]
F. Richoux and J. F. Baffier, Automatic cost function learning with interpretable compositional networks, arXiv preprint arXiv: 2002.09811, 2021.
[199]
Y. L. Chen, Z. C. Wang, Y. X. Peng, Z. Q. Zhang, G. Yu, and J. Sun, Cascaded pyramid network for multi-person pose estimation, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7103–7112.
[200]
F. C. Long, T. Yao, Z. F. Qiu, X. M. Tian, T. Mei, and J. B. Luo, Coarse-to-fine localization of temporal action proposals, IEEE Trans. Multimed., vol. 22, no. 6, pp. 1577–1590, 2020.
[201]
K. M. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2980–2988.
[202]
Z. Tian, C. H. Shen, and H. Chen, Conditional convolutions for instance segmentation, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 282–298.
[203]
Z. J. Huang, L. C. Huang, Y. C. Gong, C. Huang, and X. G. Wang, Mask scoring R-CNN, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6402–6411.
[204]
R. Dabral, N. B. Gundavarapu, R. Mitra, A. Sharma, G. Ramakrishnan, and A. Jain, Multi-person 3D human pose estimation from monocular images, in Proc. 2019 Int. Conf. 3D Vision (3DV), Quebec City, Canada, 2019, pp. 405–414.
[205]
S. Jin, W. T. Liu, E. Z. Xie, W. H. Wang, C. Qian, W. L. Ouyang, and P. Luo, Differentiable hierarchical graph grouping for multi-person pose estimation, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 718–734.
[206]
W. A. Mao, Z. Tian, X. L. Wang, and C. H. Shen, FCPose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 9030–9039.
[207]
H. X. Qiao, Y. Xu, Z. J. Zhao, J. H. Tian, J. H. Zhang, and C. B. Peng, The network improvement and connection refinement for multi-person pose estimation, in Proc. 2nd Int. Conf. Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 2019, pp. 414–418.
[208]
L. Pishchulin, A. Jain, M. Andriluka, T. Thormählen, and B. Schiele, Articulated people detection and pose estimation: Reshaping the future, in Proc. 2012 IEEE Conf. Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 3178–3185.
[209]
M. Eichner and V. Ferrari, We are family: Joint pose estimation of multiple persons, in Proc. 11th European Conf. Computer Vision, Heraklion, Greece, 2010, pp. 228–242.
[210]
Z. Y. Huang, Y. Liu, Y. J. Fang, and B. K. P. Horn, Video-based fall detection for seniors with human pose estimation, in Proc. 4th Int. Conf. Universal Village (UV), Boston, MA, USA, 2018, pp. 1–4.
[211]
A. Viswakumar, V. Rajagopalan, T. Ray, and C. Parimi, Human gait analysis using OpenPose, in Proc. 5th Int. Conf. Image Information Processing (ICIIP), Shimla, India, 2019, pp. 310–314.
[212]
Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields, IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, 2021.
[213]
N. Kato, T. Q. Li, K. Nishino, and Y. Uchida, Improving multi-person pose estimation using label correction, arXiv preprint arXiv: 1811.03331, 2018.
[214]
M. Slembrouck, H. Luong, J. Gerlo, K. Schütte, D. Van Cauwelaert, D. De Clercq, B. Vanwanseele, P. Veelaert, and W. Philips, Multiview 3D markerless human pose estimation from OpenPose skeletons, in Proc. 20th Int. Conf. Advanced Concepts for Intelligent Vision Systems, Auckland, New Zealand, 2020, pp. 166–178.
[215]
M. Rajchl, M. C. H. Lee, O. Oktay, K. Kamnitsas, J. Passerat-Palmbach, W. J. Bai, M. Damodaram, M. A. Rutherford, J. V. Hajnal, B. Kainz, et al., DeepCut: Object segmentation from bounding box annotations using convolutional neural networks, IEEE Trans. Med. Imaging, vol. 36, no. 2, pp. 674–683, 2017.
[216]
L. Pishchulin, E. Insafutdinov, S. Y. Tang, B. Andres, M. Andriluka, P. Gehler, and B. Schiele, DeepCut: Joint subset partition and labeling for multi person pose estimation, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 4929–4937.
[217]
A. Newell, Z. A. Huang, and J. Deng, Associative embedding: End-to-end learning for joint detection and grouping, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 2274–2284.
[218]
Z. H. Yu, J. Zheng, D. Z. Lian, Z. H. Zhou, and S. H. Gao, Single-image piece-wise planar 3D reconstruction via associative embedding, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 1029–1037.
[219]
P. Wang, X. H. Shen, Z. Lin, S. Cohen, B. Price, and A. Yuille, Joint object and part segmentation using deep learned potentials, in Proc. 2015 IEEE Int. Conf. Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1573–1581.
[220]
A. S. Jackson, M. Valstar, and G. Tzimiropoulos, A CNN cascade for landmark guided semantic part segmentation, in Proc. European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 143–155.
[221]
A. Rangesh and M. M. Trivedi, When vehicles see pedestrians with phones: A multicue framework for recognizing phone-based activities of pedestrians, IEEE Trans. Intell. Vehicles, vol. 3, no. 2, pp. 218–227, 2018.
[222]
C. Z. Lin, J. W. Lu, and J. Zhou, Multi-grained deep feature learning for robust pedestrian detection, IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 12, pp. 3608–3621, 2019.
[223]
C. Anderson, X. X. Du, R. Vasudevan, and M. Johnson-Roberson, Stochastic sampling simulation for pedestrian trajectory prediction, in Proc. 2019 IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 4236–4243.
[224]
B. W. Cheng, B. Xiao, J. D. Wang, H. H. Shi, T. S. Huang, and L. Zhang, HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 5385–5394.
[225]
N. Jaouedi, F. J. Perales, J. M. Buades, N. Boujnah, and M. S. Bouhlel, Prediction of human activities based on a new structure of skeleton features and deep learning model, Sensors, vol. 20, no. 17, p. 4944, 2020.
[226]
L. Xu, X. Ma, and J. Yan, Scene-perception graph convolutional networks for human action prediction, in Proc. 2021 Int. Joint Conf. Neural Networks (IJCNN), Shenzhen, China, 2021, pp. 1–8.
[227]
Y. Tang, L. Zhao, Z. L. Yao, C. Gong, and J. Yang, Graph-based motion prediction for abnormal action detection, in Proc. 2nd ACM Int. Conf. Multimedia in Asia, Singapore, 2021, p. 63.
[228]
C. Vondrick, H. Pirsiavash, and A. Torralba, Anticipating visual representations from unlabeled video, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 98–106.
[229]
Q. H. Ke, M. Bennamoun, S. J. An, F. Sohel, and F. Boussaid, Leveraging structural context models and ranking score fusion for human interaction prediction, IEEE Trans. Multimed., vol. 20, no. 7, pp. 1712–1723, 2018.
[230]
H. Xue, D. Q. Huynh, and M. Reynolds, Bi-Prediction: Pedestrian trajectory prediction based on bidirectional LSTM classification, in Proc. 2017 Int. Conf. Digital Image Computing: Techniques and Applications (DICTA), Sydney, NSW, Australia, 2017, pp. 1–8.
[231]
H. Xue, D. Q. Huynh, and M. Reynolds, SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction, in Proc. 2018 IEEE Winter Conf. Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 2018, pp. 1186–1194.
[232]
P. Gujjar and R. Vaughan, Classifying pedestrian actions in advance using predicted video of urban driving scenes, in Proc. 2019 Int. Conf. Robotics and Automation (ICRA), Montreal, Canada, 2019, pp. 2097–2103.
[233]
A. Furnari and G. Farinella, What would you expect? Anticipating egocentric actions with rolling-unrolling LSTMs and modality attention, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 6251–6260.
[234]
K. Saleh, M. Hossny, and S. Nahavandi, Real-time intent prediction of pedestrians for autonomous ground vehicles via spatio-temporal densenet, in Proc. 2019 Int. Conf. Robotics and Automation (ICRA), Montreal, Canada, 2019, pp. 9704–9710.
[235]
T. Yau, S. Malekmohammadi, A. Rasouli, P. Lakner, M. Rohani, and J. Luo, Graph-SIM: A graph-based spatiotemporal interaction modelling for pedestrian action prediction, in Proc. 2021 IEEE International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 2021, pp. 8580–8586.
[236]
T. Y. Zhang, W. Q. Min, Y. Zhu, Y. Rui, and S. Q. Jiang, An egocentric action anticipation framework via fusing intuition and analysis, in Proc. 28th ACM Int. Conf. Multimedia, Seattle, WA, USA, 2020, pp. 402–410.
[237]
M. Kocabas, S. Karagoz, and E. Akbas, MultiPoseNet: Fast multi-person pose estimation using pose residual network, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 437–453.
[238]
B. Xiao, H. P. Wu, and Y. C. Wei, Simple baselines for human pose estimation and tracking, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 472–487.
[239]
W. B. Li, Z. C. Wang, B. Y. Yin, Q. X. Peng, Y. M. Du, T. Z. Xiao, G. Yu, H. T. Lu, Y. C. Wei, and J. Sun, Rethinking on multi-stage networks for human pose estimation, arXiv preprint arXiv: 1901.00148, 2019.
[240]
K. Sun, B. Xiao, D. Liu, and J. D. Wang, Deep high-resolution representation learning for human pose estimation, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 5686–5696.
[241]
H. J. Liu, F. Q. Liu, X. Y. Fan, and D. Huang, Polarized self-attention: Towards high-quality pixel-wise regression, arXiv preprint arXiv: 2107.00782, 2021.
[242]
E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, DeeperCut: A deeper, stronger, and faster multi-person pose estimation model, in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 34–50.
[243]
S. E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, Convolutional pose machines, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 4724–4732.
[244]
A. Newell, K. Y. Yang, and J. Deng, Stacked hourglass networks for human pose estimation, in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 483–499.
[245]
X. Chu, W. Yang, W. L. Ouyang, C. Ma, A. L. Yuille, and X. G. Wang, Multi-context attention for human pose estimation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5669–5678.
[246]
D. Groos, H. Ramampiaro, and E. A. F. Ihlen, EfficientPose: Scalable single-person pose estimation, Appl. Intell., vol. 51, no. 4, pp. 2518–2533, 2021.
[247]
A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, End-to-end recovery of human shape and pose, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7122–7131.
[248]
Y. L. Xu, S. C. Zhu, and T. Tung, DenseRaC: Joint 3D pose and shape estimation by dense render-and-compare, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 7759–7769.
[249]
L. Zhao, X. Peng, Y. Tian, M. Kapadia, and D. N. Metaxas, Semantic graph convolutional networks for 3D human pose regression, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 3420–3430.
[250]
F. Y. Huang, A. L. Zeng, M. H. Liu, Q. X. Lai, and Q. Xu, DeepFuse: An IMU-aware network for real-time 3D human pose estimation from multi-view image, in Proc. 2020 IEEE Winter Conf. Applications of Computer Vision, Snowmass, CO, USA, 2020, pp. 418–427.
[251]
W. K. Shan, H. P. Lu, S. S. Wang, X. F. Zhang, and W. Gao, Improving robustness and accuracy via relative information encoding in 3D human pose estimation, arXiv preprint arXiv: 2107.13994, 2021.
[252]
N. D. Reddy, L. Guigues, L. Pishchulin, J. Eledath, S. G. Narasimhan, C. M. University, and Amazon, TesseTrack: End-to-end learnable multi-person articulated 3D pose tracking, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2021, pp. 15190–15200.
[253]
J. Q. Zhong, H. Sun, W. M. Cao, and Z. H. He, Pedestrian motion trajectory prediction with stereo-based 3D deep pose estimation and trajectory learning, IEEE Access, vol. 8, pp. 23480–23486, 2020.
[254]
M. Andriluka, U. Iqbal, E. Insafutdinov, L. Pishchulin, A. Milan, J. Gall, and B. Schiele, PoseTrack: A benchmark for human pose estimation and tracking, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5167–5176.
[255]
A. Arnab, C. Doersch, and A. Zisserman, Exploiting temporal context for 3D human pose estimation in the wild, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 3390–3399.
[256]
X. Sun, B. Xiao, F. Y. Wei, S. Liang, and Y. C. Wei, Integral human pose regression, in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 536–553.
[257]
F. Zhang, X. T. Zhu, H. B. Dai, M. Ye, and C. Zhu, Distribution-aware coordinate representation for human pose estimation, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 7091–7100.
[258]
H. Chen, P. F. Guo, P. F. Li, G. H. Lee, and G. Chirikjian, Multi-person 3D pose estimation in crowded scenes based on multi-view geometry, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 541–557.
[259]
Z. Zhang, C. Y. Wang, W. H. Qin, and W. J. Zeng, Fusing wearable IMUs with multi-view images for human pose estimation: A geometric approach, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 2197–2206.
[260]
N. Sarafianos, B. Boteanu, B. Ionescu, and I. A. Kakadiaris, 3D Human pose estimation: A review of the literature and analysis of covariates, Comput. Vis. Image Underst., vol. 152, pp. 1–20, 2016.
[261]
Y. Kong, Z. Q. Tao, and Y. Fu, Deep sequential context networks for action prediction, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3662–3670.
[262]
A. Jalal, I. Akhtar, and K. Kim, Human posture estimation and sustainable events classification via pseudo-2D stick model and K-ary tree hashing, Sustainability, vol. 12, no. 23, p. 9814, 2020.
[263]
M. Hassan, V. Choutas, D. Tzionas, and M. Black, Resolving 3D human pose ambiguities with 3D scene constraints, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 2282–2292.
[264]
J. N. Zhen, Q. Fang, J. M. Sun, W. T. Liu, W. Jiang, H. J. Bao, and X. W. Zhou, SMAP: Single-shot multi-person absolute 3D pose estimation, in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 550–566.
[265]
H. Y. Xu, E. G. Bazavan, A. Zanfir, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, GHUM & GHUML: Generative 3D human shape and articulated pose models, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 6183–6192.
[266]
W. Yang, S. Li, W. L. Ouyang, H. S. Li, and X. G. Wang, Learning feature pyramids for human pose estimation, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1290–1299.
[267]
K. Huang, T. Q. Sui, and H. Wu. 3D human pose estimation with multi-scale graph convolution and hierarchical body pooling, Multimed. Syst., .
[268]
D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, 3D human pose estimation in video with temporal convolutions and semi-supervised training, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 7745–7754.
[269]
H. S. Koppula and A. Saxena, Anticipating human activities using object affordances for reactive robotic response, IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 14–29, 2016.
[270]
L. Shi, Y. F. Zhang, J. Cheng, and H. Q. Lu, Two-stream adaptive graph convolutional networks for skeleton-based action recognition, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 12018–12027.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 May 2021
Accepted: 06 September 2021
Published: 21 June 2022
Issue date: December 2022

Copyright

© The author(s) 2022.

Acknowledgements

The authors wish to thank Dian’en Zhang and Wenjuan Li from Beijing Union University, Beijing, China. We really thank anonymous reviewers’ constructive suggestions. This work was supported by the National Natural Science Foundation of China (Nos. 61871038 and 61931012), the Premium Funding Project for Academic Human Resources Development of Beijing Union University (No. BPHR2020AZ02), and the Generic Pre-research Program of the Equipment Development Department in Military Commission (No. 41412040302).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return