Swin3D++: Effective multi-source pretraining for 3D indoor scene understanding

Yu-Qi Yang; Yu-Xiao Guo; Yang Liu

doi:10.26599/CVM.2025.9450437

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (9.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Swin3D++: Effective multi-source pretraining for 3D indoor scene understanding

Yu-Qi Yang^¹, Yu-Xiao Guo^², Yang Liu^²(

)

Institute for Advanced Study, Tsinghua University, Beijing 100084, China

Microsoft Research Asia, Beijing 100080, China

Show Author Information

Abstract

Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision. However, the 3D vision domain suffers from a lack of 3D data, and simply combining multiple 3D datasets for pretraining a 3D backbone does not yield significant improvement, due to the domain discrepancies among different 3D datasets that impede effective feature learning. In this work, we identify the main sources of the domain discrepancies between 3D indoor scene datasets, and propose Swin3D++, an enhanced architecture based on Swin3D for efficient pretraining on multi-source 3D point clouds. Swin3D++ introduces domain-specific mechanisms to Swin3D’s modules to address domain discrepancies and enhance the network capability on multi-source pretraining. Moreover, we devise a simple source-augmentation strategy to increase the pretraining data scale and facilitate supervised pretraining. We validate the effectiveness of our design, and demonstrate that Swin3D++ surpasses the state-of-the-art 3D pretraining methods on typical indoor scene understanding tasks.

Graphical Abstract

Keywords

3D scenes indoor pretraining multi-source data data augmentation

References

【1】

Crossref Google Scholar

Computational Visual Media

Volume 11 Issue 3,
June 2025

Pages 465-481

DOI: 10.26599/CVM.2025.9450437

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Yang Y-Q, Guo Y-X, Liu Y. Swin3D++: Effective multi-source pretraining for 3D indoor scene understanding. Computational Visual Media, 2025, 11(3): 465-481. https://doi.org/10.26599/CVM.2025.9450437

781

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 21 February 2024

Accepted: 24 April 2024

Published: 04 June 2025

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

To submit a manuscript, please go to https://jcvm.org.