Abstract
Open-world object detection (OWOD) is a challenging task requiring models to detect both known and unknown objects while incrementally learning from new data. Current OWOD methods typically label regions with high objectness scores as unknown objects, relying heavily on known object supervision, leading to label bias. To address this, we propose object reconstruction error modeling, using object-level semantic information for unsupervised foreground and background modeling. Additionally, we introduce an unsupervised proposal generation method, leveraging segment anything model’s zero-shot learning to generate pseudo-labels for unknown objects. However, classifiers trained on known categories tend to bias toward them during inference. To resolve this, we propose a location-enhanced network, reframing classification as a location quality prediction task. Our method achieves a significant 37% improvement in unknown category recall (52.1%) on the Microsoft common objects in context (MS-COCO) dataset, outperforming previous state-of-the-art methods while maintaining competitive performance on known objects. Furthermore, it surpasses deformable detection transformer (DETR)-based models, achieving 10.95 frames per second, with a speed advantage over faster region-based convolutional neural network (Faster R-CNN)-based methods.