PCT: Point cloud transformer

Meng-Hao Guo; Jun-Xiong Cai; Zheng-Ning Liu; Tai-Jiang Mu; Ralph R. Martin; Shi-Min Hu

doi:10.1007/s41095-021-0229-5

Computational Visual Media 2021, 7(2): 187-199 https://doi.org/10.1007/s41095-021-0229-5

Research Article |

Open Access | Issue | Published: 10 April 2021

PCT: Point cloud transformer

Show Author's Information Hide Author's Information Meng-Hao Guo^¹, Jun-Xiong Cai^¹, Zheng-Ning Liu^¹, Tai-Jiang Mu^¹, Ralph R. Martin^², Shi-Min Hu^¹(

)

1BNRist, Department of Computer Science andTechnology, Tsinghua University, Beiing 100084, China

2Cardiff University, Cardiff CF243AA, UK

Keywords:

deep learning, 3D computer vision, point cloud processing, Transformer

Cite this article:

Guo M-H, Cai J-X, Liu Z-N, et al. PCT: Point cloud transformer. Computational Visual Media, 2021, 7(2): 187-199. https://doi.org/10.1007/s41095-021-0229-5

Download citation

EndNote(RIS)

BibTeX

1670

Views

165

Downloads

Citations

724

Crossref

600

WoS

780

Scopus

CSCD

Abstract Full text About this article

Abstract

The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer,which achieves huge success in natural language processingand displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Full text

Abstract

Full text

Outline

About this article

PCT: Point cloud transformer

Show Author's information Hide Author's Information Meng-Hao Guo^¹, Jun-Xiong Cai^¹, Zheng-Ning Liu^¹, Tai-Jiang Mu^¹, Ralph R. Martin^², Shi-Min Hu^¹(

)

1BNRist, Department of Computer Science andTechnology, Tsinghua University, Beiing 100084, China

2Cardiff University, Cardiff CF243AA, UK

Abstract

Keywords: deep learning, 3D computer vision, point cloud processing, Transformer

References(40)

[1]

Charles, R. Q.; Hao, S.; Mo, K. C.; Guibas, L. J. PointNet: Deep learning on point sets for 3D classification and segmentation. IN: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 77-85, 2017.

DOI

[2]

Tchapmi, L. P.; Choy, C. B.; Armeni, I.; Gwak, J.; Savarese, S. SEGCloud: Semantic segmentation of 3D point clouds. In: Proceedings of the International Conference on 3D Vision, 537-547, 2017.

DOI

[3]

Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. PointCNN: Convolution on x-transformed points. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 828-838, 2018.

[4]

Atzmon, M.; Maron, H.; Lipman, Y. Point convolutional neural networks by extension operators. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 71, 2018.

DOI Google Scholar

[5]

Wu, W. X.; Qi, Z.; Fuxin, L. PointConv: Deep convolutional networks on 3D point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9613-9622, 2019.

[6]

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing, 6000-6010, 2017.

[7]

Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

Google Scholar

[8]

Wu, B.; Xu, C.; Dai, X.; Wan, A.; Zhang, P.; Tomizuka, M.; Keutzer, K.; Vajda, P. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677, 2020.

Google Scholar

[9]

Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. In: Proceedings of the International Conference on Learning Representations, 2014.

[10]

Hu, S.-M.; Liang, D.; Yang, G.-Y.; Yang, G.-W.; Zhou, W.-Y. Jittor: A novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences Vol. 63, No. 12, Article No. 222103, 2020.

DOI Google Scholar

[11]

Bahdanau, D.; Cho, K. H.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations, 2015.

[12]

Lin, Z.; Feng, M.; dos Santos, C. N.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y. A structured self-attentive sentence embedding. In: Proceedings of the International Conference on Learning Representations, 2017.

[13]

Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, 4171-4186, 2019.

[14]

Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J. G.; Salakhutdinov, R.; Le, Q. V. XLNet: Generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, 5754-5764, 2019.

[15]

Dai, Z. H.; Yang, Z. L.; Yang, Y. M.; Carbonell, J.; Le, Q.; Salakhutdinov, R. Transformer-XL: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2978-2988, 2019.

DOI

[16]

Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C. H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics Vol. 36, No. 4, 1234-1240, 2020.

DOI Google Scholar

[17]

Wang, F.; Jiang, M. Q.; Qian, C.; Yang, S.; Li, C.; Zhang, H. G.; Wang, X.; Tang, X. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6450-6458, 2017.

DOI

[18]

Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7132-7141, 2018.

DOI

[19]

Zhang, H.; Goodfellow, I. J.; Metaxas, D. N.; Odena, A. Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, 7354-7363, 2019.

[20]

Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In: Computer Vision - ECCV 2020. Lecture Notes in Computer Science, Vol. 12346. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 213-229, 2020.

DOI

[21]

Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 5099-5108, 2017.

[22]

Hermosilla, P.; Ritschel, T.; Vázquez, P. P.; Vinacua, À.; Ropinski, T. Monte Carlo convolution for learning on non-uniformly sampled point clouds. ACM Transactions on Graphics Vol. 37, No. 6, Article No. 235, 2018.

DOI Google Scholar

[23]

Tatarchenko, M.; Park, J.; Koltun, V.; Zhou, Q. Y. Tangent convolutions for dense prediction in 3D. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3887-3896, 2018.

DOI

[24]

Landrieu, L.; Simonovsky, M. Large-scale point cloud semantic segmentation with superpoint graphs. In:Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition, 4558-4567, 2018.

DOI

[25]

Yang, Y. Q.; Liu, S. L.; Pan, H.; Liu, Y.; Tong, X. PFCNN: Convolutional neural networks on 3D surfaces using parallel frames. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13575-13584, 2020.

DOI

[26]

Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S. E.; Bronstein, M. M.; Solomon, J. M. Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics Vol. 38, No. 5, Article No. 146, 2019.

DOI Google Scholar

[27]

Yan, X.; Zheng, C. D.; Li, Z.; Wang, S.; Cui, S. G. PointASNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5588-5597, 2020.

DOI

[28]

Hertz, A.; Hanocka, R.; Giryes, R.; Cohen-Or, D. PointGMM: A neural GMM network for point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12051-12060, 2020.

DOI

[29]

Wang, Y.; Solomon, J. Deep closest point: Learning representations for point cloud registration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 3522-3531, 2019.

DOI

[30]

Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1912-1920, 2015.

[31]

Yi, L.; Kim, V. G.; Ceylan, D.; Shen, I. C.; Yan, M. Y.; Su, H.; Lu, C.; Huang, Q.; Sheffer, A.; Guibas, L. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics Vol. 35, No. 6, Article No. 210, 2016.

DOI Google Scholar

[32]

Xie, S. N.; Liu, S. N.; Chen, Z. Y.; Tu, Z. W.Attentional ShapeContextNet for point cloud recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4606-4615, 2018.

DOI

[33]

Li, J. X.; Chen, B. M.; Lee, G. H. SO-net: Self-organizing network for point cloud analysis. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9397-9406, 2018.

[34]

Klokov, R.; Lempitsky, V. Escape from cells: Deep kd-networks for the recognition of 3D point cloud models. In: Proceeding of the IEEE International Conference on Computer Vision, 863-872, 2017.

DOI

[35]

Le, T.; Duan, Y. PointGrid: A deep network for 3D shape understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9204-9214, 2018.

DOI

[36]

Zhao, H.; Jiang, L.; Fu, C.; Jia, J. PointWeb: Enhancing local neighborhood features for point cloud processing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5560-5568, 2019.

DOI

[37]

Komarichev, A.; Zhong, Z. C.; Hua, J. A-CNN: Annularly convolutional neural networks on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7413-7422, 2019.

DOI

[38]

Liu, X. H.; Han, Z. Z.; Liu, Y. S.; Zwicker, M. Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 8778-8785, 2019.

DOI

[39]

Thomas, H.; Qi, C. R.; Deschaud, J. E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 6410-6419, 2019.

DOI

[40]

Liu, Y. C.; Fan, B.; Xiang, S. M.; Pan, C. H. Relation-shape convolutional neural network for point cloud analysis. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8887-8896, 2019.

DOI

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 04 March 2021

Accepted: 26 March 2021

Published: 10 April 2021

Issue date: June 2021

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Project Number 61521002) and the Joint NSFC-DFG Research Program (Project Number 61761136018).

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www. editorialmanager.com/cvmj.