Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Chao Qi; Jianqin Yin; Zhicheng Zhang; Jin Tang

doi:10.26599/TST.2023.9010002

Tsinghua Science and Technology 2024, 29(1): 232-243 https://doi.org/10.26599/TST.2023.9010002

Open Access | Issue | Published: 21 August 2023

Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Show Author's Information Hide Author's Information Chao Qi^{¹^,²}, Jianqin Yin^¹(

), Zhicheng Zhang^¹, Jin Tang^¹

1School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

2Standard and Metrology Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China

Keywords:

point cloud, scene graph generation, structural representation

Cite this article:

Qi C, Yin J, Zhang Z, et al. Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning. Tsinghua Science and Technology, 2024, 29(1): 232-243. https://doi.org/10.26599/TST.2023.9010002

Download citation

EndNote(RIS)

BibTeX

204

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Scene graphs of point clouds help to understand object-level relationships in the 3D space. Most graph generation methods work on 2D structured data, which cannot be used for the 3D unstructured point cloud data. Existing point-cloud-based methods generate the scene graph with an additional graph structure that needs labor-intensive manual annotation. To address these problems, we explore a method to convert the point clouds into structured data and generate graphs without given structures. Specifically, we cluster points with similar augmented features into groups and establish their relationships, resulting in an initial structural representation of the point cloud. Besides, we propose a Dynamic Graph Generation Network (DGGN) to judge the semantic labels of targets of different granularity. It dynamically splits and merges point groups, resulting in a scene graph with high precision. Experiments show that our methods outperform other baseline methods. They output reliable graphs describing the object-level relationships without additional manual labeled data.

Full text

Abstract

Full text

Outline

About this article

Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning

Show Author's information Hide Author's Information Chao Qi^{¹^,²}, Jianqin Yin^¹(

), Zhicheng Zhang^¹, Jin Tang^¹

1School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China

2Standard and Metrology Research Institute, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China

Abstract

Keywords: point cloud, scene graph generation, structural representation

References(34)

[1]

A. A. Liu, H. Zhou, W. Nie, Z. Liu, W. Liu, H. Xie, Z. Mao, X. Li, and D. Song, Hierarchical multi-view context modelling for 3D object classification and retrieval, Inf. Sci., vol. 547, pp. 984–995, 2021.

DOI Google Scholar

[2]

L. Deng, M. Yang, Z. Liang, Y. He, and C. Wang, Fusing geometrical and visual information via superpoints for the semantic segmentation of 3D road scenes, Tsinghua Science and Technology, vol. 25, no. 4, pp. 498–507, 2020.

DOI Google Scholar

[3]

J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, Graph R-CNN for scene graph generation, in Proc. 15^th European Conf. Computer Vision, Munich, Germany, 2018, pp. 690–706.

DOI Google Scholar

[4]

X. Li and S. Jiang, Know more say less: Image captioning based on scene graphs, IEEE Trans. Multimedia, vol. 21, no. 8, pp. 2117–2130, 2019.

DOI Google Scholar

[5]

D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei, Scene graph generation by iterative message passing, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 3097–3106.

DOI Google Scholar

[6]

J. Wald, H. Dhamo, N. Navab, and F. Tombari, Learning 3D semantic scene graphs from 3D indoor reconstructions, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 3960–3969.

DOI Google Scholar

[7]

C. Zhang, J. Yu, Y. Song, and W. Cai, Exploiting edge-oriented reasoning for 3D point-based scene graph analysis, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 9700–9710.

DOI Google Scholar

[8]

J. Gu, H. Zhao, Z. Lin, S. Li, J. Cai, and M. Ling, Scene graph generation with external knowledge and image reconstruction, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1969–1978.

DOI Google Scholar

[9]

Z. Lin, F. Zhu, Q. Wang, Y. Kong, J. Wang, L. Huang, and Y. Hao, RSSGG_CS: Remote sensing image scene graph generation by fusing contextual information and statistical knowledge, Remote Sens., vol. 14, no. 13, p. 3118, 2022.

DOI Google Scholar

[10]

J. You, R. Ying, X. Ren, W. L. Hamilton, and J. Leskovec, GraphRNN: Generating realistic graphs with deep auto-regressive models, in Proc. 35^th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 5694–5703.

Google Scholar

[11]

R. Zellers, M. Yatskar, S. Thomson, and Y. Choi, Neural motifs: Scene graph parsing with global context, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5831–5840.

DOI Google Scholar

[12]

J. Wald, A. Avetisyan, N. Navab, F. Tombari, and M. Nießner, RIO: 3D object instance Re-localization in changing indoor environments, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 7657–7666.

DOI Google Scholar

[13]

Y. Xie, J. Tian, and X. X. Zhu, Linking points with labels in 3D: A review of point cloud semantic segmentation, IEEE Geosci. Remote Sens. Mag., vol. 8, no. 4, pp. 38–59, 2020.

DOI Google Scholar

[14]

W. Liu, Z. Liu, Q. Li, Z. Han, and A. Núñez, High-precision detection method for structure parameters of catenary cantilever devices using 3-D point cloud data, IEEE Trans. Instrum. Meas., vol. 70, p. 3507811, 2021.

DOI Google Scholar

[15]

J. Xiao, J. Zhang, B. Adler, H. Zhang, and J. Zhang, Three-dimensional point cloud plane segmentation in both structured and unstructured environments, Rob. Auton. Syst., vol. 61, no. 12, pp. 1641–1652, 2013.

DOI Google Scholar

[16]

J. E. Deschaud and F. Goulette, A fast and accurate plane detection algorithm for large noisy point clouds using filtered normals and voxel growing, presented at 3D Data Processing Visualization and Transmission, Paris, France, 2010.

Google Scholar

[17]

T. Rabbani, F. Van Den Heuvel, and G. Vosselman, Segmentation of point clouds using smoothness constraints, in Proc. ISPRS Commission V Symp.: Image Engineering and Vision Metrology, Dresden, Germany, 2006, pp. 248–253.

Google Scholar

[18]

M. A. Wani and H. R. Arabnia, Parallel edge-region-based segmentation algorithm targeted at reconfigurable MultiRing network, J. Supercomput., vol. 25, no. 1, pp. 43–62, 2003.

DOI Google Scholar

[19]

J. Papon, A. Abramov, M. Schoeler, and F. Wörgötter, Voxel cloud connectivity segmentation-supervoxels for point clouds, in Proc. 2013 IEEE Conf. Computer Vision and Pattern Recognition, Portland, OR, USA, 2013, pp. 2027–2034.

DOI Google Scholar

[20]

A. Delong, A. Osokin, H. N. Isack, and Y. Boykov, Fast approximate energy minimization with label costs, Int. J. Comput. Vis., vol. 96, no. 1, pp. 1–27, 2012.

DOI Google Scholar

[21]

D. Kong, L. Xu, X. Li, and S. Li, K-plane-based classification of airborne LiDAR data for accurate building roof measurement, IEEE Trans. Instrum. Meas., vol. 63, no. 5, pp. 1200–1214, 2014.

DOI Google Scholar

[22]

L. Landrieu and M. Simonovsky, Large-scale point cloud semantic segmentation with superpoint graphs, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 4558–4567.

DOI Google Scholar

[23]

C. Qi and J. Yin, Multigranularity semantic labeling of point clouds for the measurement of the rail tanker component with structure modeling, IEEE Trans. Instrum. Meas., vol. 70, p. 5000312, 2021.

DOI Google Scholar

[24]

S. Guinard and L. Landrieu, Weakly supervised segmentation-aided classification of urban scenes from 3D LiDAR point clouds, in Proc. Int. Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Hannover, Germany, 2017, pp. 151–157.

DOI Google Scholar

[25]

L. Landrieu and G. Obozinski, Cut pursuit: Fast algorithms to learn piecewise constant functions on general weighted graphs, SIAM J. Imaging Sci., vol. 10, no. 4, pp. 1724–1766, 2017.

DOI Google Scholar

[26]

L. P. Chew, Constrained Delaunay triangulations, Algorithmica, vol. 4, nos. 1–4, pp. 97–108, 1989.

DOI Google Scholar

[27]

R. Q. Charles, S. Hao, K. Mo, and L. J. Guibas, PointNet: Deep learning on point sets for 3D classification and segmentation, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 77–85.

DOI Google Scholar

[28]

K. Cho, B. van Merrienboer, A. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1724–1734.

DOI Google Scholar

[29]

I. Armeni, O. Sener, A. R. Zamir, H. Jiang, I. Brilakis, M. Fischer, and S. Savarese, 3D semantic parsing of large-scale indoor spaces, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 1534–1543.

DOI Google Scholar

[30]

X. Roynard, J. E. Deschaud, and F. Goulette, Paris-lille-3D: A large and high-quality ground-truth urban point cloud dataset for automatic segmentation and classification, Int. J. Rob. Res., vol. 37, no. 6, pp. 545–557, 2018.

DOI Google Scholar

[31]

C. Lu, R. Krishna, M. S. Bernstein, and L. Fei-Fei, Visual relationship detection with language priors, in Proc. 14^th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 852–869.

DOI Google Scholar

[32]

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, A kernel two-sample test, J. Mach. Learn. Res., vol. 13, pp. 723–773, 2012.

Google Scholar

[33]

H. Thomas, C. R. Qi, J. E. Deschaud, B. Marcotegui, F. Goulette, and L.J. Guibas, KPConv: Flexible and deformable convolution for point clouds, in Proc. 2019 IEEE/CVF Int. Conf. Computer Vision, Seoul, Republic of Korea, 2019, pp. 6410–6419.

DOI Google Scholar

[34]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, presented at the 5^th Int. Conf. Learning Representations, Toulon, France, 2017.

Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 25 August 2022

Revised: 09 November 2022

Accepted: 06 January 2023

Published: 21 August 2023

Issue date: February 2024

Copyright

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 62173045 and 61673192), the Fundamental Research Funds for the Central Universities (No. 2020XD-A04-2), and the BUPT Excellent PhD Students Foundation (No. CX2021222).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).