AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.4 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access | Just Accepted

UMLN: Open-World Object Detection Empowered by Unsupervised Modeling and Location-Enhanced Network

Yangyang Huang1Jie Hu2Ronghua Luo1( )

1 School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China

2 College of Engineering, Jiangxi Agricultural University, Nanchang 330045, China

Show Author Information

Abstract

Open-world object detection (OWOD) is a challenging task requiring models to detect both known and unknown objects while incrementally learning from new data. Current OWOD methods typically label regions with high objectness scores as unknown objects, relying heavily on known object supervision, leading to label bias. To address this, we propose object reconstruction error modeling, using object-level semantic information for unsupervised foreground and background modeling. Additionally, we introduce an unsupervised proposal generation method, leveraging segment anything model’s zero-shot learning to generate pseudo-labels for unknown objects. However, classifiers trained on known categories tend to bias toward them during inference. To resolve this, we propose a location-enhanced network, reframing classification as a location quality prediction task. Our method achieves a significant 37% improvement in unknown category recall (52.1%) on the Microsoft common objects in context (MS-COCO) dataset, outperforming previous state-of-the-art methods while maintaining competitive performance on known objects. Furthermore, it surpasses deformable detection transformer (DETR)-based models, achieving 10.95 frames per second, with a speed advantage over faster region-based convolutional neural network (Faster R-CNN)-based methods.

Tsinghua Science and Technology
Cite this article:
Huang Y, Hu J, Luo R. UMLN: Open-World Object Detection Empowered by Unsupervised Modeling and Location-Enhanced Network. Tsinghua Science and Technology, 2025, https://doi.org/10.26599/TST.2024.9010263

71

Views

18

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 16 October 2024
Revised: 30 November 2024
Accepted: 26 December 2024
Available online: 25 June 2025

© The author(s) 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return