Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The oropharyngeal swabbing is a pre-diagnostic procedure used to test various respiratory diseases, including COVID and Influenza A (H1N1). To improve the testing efficiency of testing, a real-time, accurate, and robust sampling point localization algorithm is needed for robots. However, current solutions rely heavily on visual input, which is not reliable enough for large-scale deployment. The transformer has significantly improved the performance of image-related tasks and challenged the dominance of traditional convolutional neural networks (CNNs) in the image field. Inspired by its success, we propose a novel self-aligning multi-modal transformer (SAMMT) to dynamically attend to different parts of unaligned feature maps, preventing information loss caused by perspective disparity and simplifying overall implementation. Unlike preexisting multi-modal transformers, our attention mechanism works in image space instead of embedding space, rendering the need for the sensor registration process obsolete. To facilitate the multi-modal task, we collected and annotate an oropharynx localization/segmentation dataset by trained medical personnel. This dataset is open-sourced and can be used for future multi-modal research. Our experiments show that our model improves the performance of the localization task by 4.2% compared to the pure visual model, and reduces the pixel-wise error rate of the segmentation task by 16.7% compared to the CNN baseline.
G. Z. Yang, B. J. Nelson, R. R. Murphy, H. Choset, H. Christensen, S. H. Collins, P. Dario, K. Goldberg, K. Ikuta, N. Jacobstein, et al., Combating COVID-19—The role of robotics in managing public health and infectious diseases, Sci. Robot., vol. 5, no. 40, p. eabb5589, 2020.
S. Q. Li, W. L. Guo, H. Liu, T. Wang, Y. Y. Zhou, T. Yu, C. Y. Wang, Y. M. Yang, N. S. Zhong, N. F. Zhang, et al., Clinical application of an intelligent oropharyngeal swab robot: Implication for the COVID-19 pandemic, Eur. Respir. J., vol. 56, no. 2, p. 2001912, 2020.
Z. Xie, B. Chen, J. Liu, F. Yuan, Z. Shao, H. Yang, A. G. Domel, J. Zhang, and L. Wen, A tapered soft robotic oropharyngeal swab for throat testing: A new way to collect sputa samples, IEEE Robot. Autom. Mag., vol. 28, no. 1, pp. 90–100, 2021.
C. D. Herrera, J. Kannala, and J. Heikkilä, Joint depth and color camera calibration with distortion correction, IEEE Trans. Pattern Anal. Mach. Intell, vol. 34, no. 10, pp. 2058–2064, 2012.
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, Zero-shot text-to-image generation, arXiv preprint arXiv:, 2102.12092, 2021.
L. Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and J. Gao, Unified vision-language pre-training for image captioning and VQA, Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, pp. 13041–13049, 2020.
600
Views
59
Downloads
0
Crossref
0
Web of Science
0
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).