AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (4.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

A causal convolutional neural network for multi-subject motion modeling and generation

State Key Lab of CAD&CG, Zhejiang University, Hangzhou 310058, China
Xmov, Shanghai 200030, China
School of Automation, Southeast University, Nanjing 210096, China
Show Author Information

Graphical Abstract

Abstract

Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.

Electronic Supplementary Material

Download File(s)
41095_0307_ESM.zip (22.8 MB)

References

[1]
Zhang, P. F.; Lan, C. L.; Zeng, W. J.; Xing, J. L.; Xue, J. R.; Zheng, N. N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11091118, 2020.
[2]
Chen, Z.; Li, S. C.; Yang, B.; Li, Q. H.; Liu, H. Multi-scale spatial temporal graph convolutional network for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 2, 11131122, 2021.
[3]
Gui, L. Y.; Wang, Y. X.; Liang, X. D.; Moura, J. M. F. Adversarial geometry-aware human motion prediction. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11208. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 823842, 2018.
[4]
Wang, T. C.; Liu, M. Y.; Tao, A.; Liu, G. L.; Kautz, J.; Catanzaro, B. Few-shot video-to-video synthesis. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No. 451, 50135024, 2019.
[5]
Taylor, G. W.; Hinton, G. E. Factored conditional restricted Boltzmann Machines for modeling motion style. In: Proceedings of the 26th Annual International Conference on Machine Learning, 10251032, 2009.
[6]
Fragkiadaki, K.; Levine, S.; Felsen, P.; Malik, J. Recurrent network models for human dynamics. In: Proceedings of the IEEE International Conference on Computer Vision, 43464354, 2015.
[7]
Martinez, J.; Black, M. J.; Romero, J. On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 46744683, 2017.
[8]
Yan, X. C.; Rastogi, A.; Villegas, R.; Sunkavalli, K.; Shechtman, E.; Hadap, S.; Yumer, E.; Lee, H. MT-VAE: Learning motion transformations to generate multimodal human dynamics. In: Computer Vision – ECCV 2018. Lecture Notes in Computer Science, Vol. 11209. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 276293, 2018.
[9]
Ling, H. Y.; Zinno, F.; Cheng, G.; Van De Panne, M. Character controllers using motion VAEs. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 40, 2020.
[10]
Ghosh, A.; Cheema, N.; Oguz, C.; Theobalt, C.; Slusallek, P. Synthesis of compositional animations from textual descriptions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 13761386, 2021.
[11]
Wang, Z. Y.; Chai, J. X.; Xia, S. H. Combining recurrent neural networks and adversarial training for human motion synthesis and control. IEEE Transactions on Visualization and Computer Graphics Vol. 27, No. 1, 1428, 2021.
[12]
Barsoum, E.; Kender, J.; Liu, Z. C. HP-GAN: Probabilistic 3D human motion prediction via GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1499149909, 2018.
[13]
Kundu, J. N.; Gor, M.; Babu, R. V. BiHMP-GAN: Bidirectional 3D human motion prediction GAN. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 1, 85538560, 2019.
[14]
Wang, Z. Y.; Yu, P.; Zhao, Y.; Zhang, R. Y.; Zhou, Y. F.; Yuan, J. S.; Chen, C. Y. Learning diverse stochastic human-action generators by learning smooth latent transitions. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 7, 1228112288, 2020.
[15]
Liu, Z. G.; Lyu, K. D.; Wu, S.; Chen, H. P.; Hao, Y. B.; Ji, S. L. Aggregated multi-GANs for controlled 3D human motion prediction. Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 3, 22252232, 2021.
[16]
Holden, D.; Komura, T.; Saito, J. Phase-functioned neural networks for character control. ACM Trans-actions on Graphics Vol. 36, No. 4, Article No. 42, 2017.
[17]
Starke, S.; Zhao, Y. W.; Komura, T.; Zaman, K. Local motion phases for learning multi-contact character movements. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 54, 2020.
[18]
Starke, S.; Zhao, Y. W.; Zinno, F.; Komura, T. Neural animation layering for synthesizing martial arts movements. ACM Transactions on Graphics Vol. 40, No. 4, Article No. 92, 2021.
[19]
Van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A generative model for raw audio. In: Proceedings of the 9th ISCA Speech Synthesis Workshop, 125, 2016.
[20]
Wang, X.; Chen, Q. D.; Wang, W. L. 3D human motion editing and synthesis: A survey. Computational and Mathematical Methods in Medicine Vol. 2014, 104535, 2014.
[21]
Xia, S. H.; Gao, L.; Lai, Y. K.; Yuan, M. Z.; Chai, J. X. A survey on human performance capture and animation. Journal of Computer Science and Technology Vol. 32, No. 3, 536554, 2017.
[22]
Brand, M.; Hertzmann, A. Style machines. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 183192, 2000.
[23]
Xu, J. W.; Xu, H. Z.; Ni, B. B.; Yang, X. K.; Wang, X. L.; Darrell, T. Hierarchical style-based networks for motion synthesis. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12356. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 178194, 2020.
[24]
Xia, S. H.; Wang, C. Y.; Chai, J. X.; Hodgins, J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics Vol. 34, No. 4, Article No. 119, 2015.
[25]
Wen, Y. H.; Yang, Z. P.; Fu, H. B.; Gao, L.; Sun, Y. N.; Liu, Y. J. Autoregressive stylized motion synthesis with generative flow. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1360713607, 2021.
[26]
Aberman, K.; Li, P. Z.; Lischinski, D.; Sorkine-Hornung, O.; Cohen-Or, D.; Chen, B. Q. Skeleton-aware networks for deep motion retargeting. ACM Transactions on Graphics Vol. 39, No. 4, Article No. 62, 2020.
[27]
Min, J. Y.; Liu, H. J.; Chai, J. X. Synthesis and editing of personalized stylistic human motion. In: Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 3946, 2010.
[28]
Nie, Q.; Liu, Z. W.; Liu, Y. H. Unsupervised 3D human pose representation with viewpoint and pose disentanglement. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12364. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 102118, 2020.
[29]
Corona, E.; Pumarola, A.; Alenyà, G.; Moreno-Noguer, F. Context-aware human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 69906999, 2020.
[30]
Ghorbani, S.; Wloka, C.; Etemad, A.; Brubaker, M. A.; Troje, N. F. Probabilistic character motion synthesis using a hierarchical deep latent variable model. Computer Graphics Forum Vol. 39, No. 8, 225239, 2020.
[31]
Holden, D.; Saito, J.; Komura, T. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics Vol. 35, No. 4, Article No. 138, 2016.
[32]
Mao, W.; Liu, M. M.; Salzmann, M. History repeats itself: Human motion prediction via motion attention. In: Computer Vision – ECCV 2020. Lecture Notes in Computer Science, Vol. 12359. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 474489, 2020.
[33]
Li, M. S.; Chen, S. H.; Zhao, Y. H.; Zhang, Y.; Wang, Y. F.; Tian, Q. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 211220, 2020.
[34]
Cui, Q. J.; Sun, H. J.; Yang, F. Learning dynamic relationships for 3D human motion prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 65186526, 2020.
[35]
Yuan, Y.; Kitani, K. M. Diverse trajectory forecasting with determinantal point processes. In: Proceedings of the 8th International Conference on Learning Representations, 2020.
[36]
Petrovich, M.; Black, M. J.; Varol, G. Action-conditioned 3D human motion synthesis with transformer VAE. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 1096510975, 2021.
[37]
Henter, G. E.; Alexanderson, S.; Beskow, J. MoGlow: Probabilistic and controllable motion synthesis using normalising flows. ACM Transactions on Graphics Vol. 39, No. 6, Article No. 236, 2020.
[38]
Valle-Pérez, G.; Henter, G. E.; Beskow, J.; Holzapfel, A.; Oudeyer, P. Y.; Alexanderson, S. Transflower: Probabilistic autoregressive dance generation with multimodal attention. ACM Transactions on Graphics Vol. 40, No. 6, Article No. 195, 2021.
[39]
Grassia, F. S. Practical parameterization of rotations using the exponential map. Journal of Graphics Tools Vol. 3, No. 3, 2948, 1998.
[40]
Lee, K.; Lee, S.; Lee, J. Interactive character animation by learning multi-objective control. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 180, 2018.
[41]
Ghosh, P.; Song, J.; Aksan, E.; Hilliges, O. Learning human motion models for long-term predictions. In: Proceedings of the International Conference on 3D Vision, 458466, 2017.
[42]
Graves, A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
[43]
Buss, S. R. Introduction to inverse kinematics with Jacobian transpose, pseudoinverse and damped least squares methods. IEEE Journal of Robotics and Automation Vol. 17, 16, 2004.
[44]
Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3.6M: Large scale datasets and predictivemethods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 36, No. 7, 13251339, 2014.
[45]
Pavllo, D.; Feichtenhofer, C.; Auli, M.; Grangier, D. Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision Vol. 128, No. 4, 855872, 2020.
Computational Visual Media
Pages 45-59
Cite this article:
Hou S, Wang C, Zhuang W, et al. A causal convolutional neural network for multi-subject motion modeling and generation. Computational Visual Media, 2024, 10(1): 45-59. https://doi.org/10.1007/s41095-022-0307-3

389

Views

15

Downloads

2

Crossref

2

Web of Science

2

Scopus

0

CSCD

Altmetrics

Received: 25 May 2022
Accepted: 02 August 2022
Published: 30 November 2023
© The Author(s) 2023.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Return