AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (18 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Multiscale Information Fusion Based on Large Model Inspired Bacterial Detection

College of Computer Science and Technology, Qingdao University, Qingdao 266070, China
Department of Software Engineering and Game Development, Kennesaw State University, Atlanta, CA 30060, USA
Technology Center of Qingdao Customs District, Qingdao 266070, China
Show Author Information

Abstract

Accurate and efficient bacterial detection is essential for public health and medical diagnostics. However, traditional detection methods are constrained by limited dataset size, complex bacterial morphology, and diverse detection environments, hindering their effectiveness. In this study, we present EagleEyeNet, a novel multi-scale information fusion model designed to address these challenges. EagleEyeNet leverages large models as teacher networks in a knowledge distillation framework, significantly improving detection performance. Additionally, a newly designed feature fusion architecture, integrating Transformer modules, is proposed to enable the efficient fusion of global and multi-scale features, overcoming the bottlenecks posed by Feature Pyramid Networks (FPN) structures, which in turn reduces information transmission loss between feature layers. To improve the model’s adaptability for different scenarios, we create our own QingDao Bacteria Detection (QDBD) dataset as a comprehensive evaluation benchmark for bacterial detection. Experimental results demonstrate that EagleEyeNet achieves remarkable performance improvements, with mAP50 increases of 3.1% on the QDBD dataset and 4.9% on the AGRA dataset, outperforming the State-Of-The-Art (SOTA) methods in detection accuracy. These findings underscore the transformative potential of integrating large models and deep learning for advancing bacterial detection technologies.

References

[1]

Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, Object detection in 20 years: A survey, Proc. IEEE, vol. 111, no. 3, pp. 257–276, 2023.

[2]

Z.-Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, Object detection with deep learning: A review, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019.

[3]
P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, in Proc. 2001 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, Kauai, HI, USA, 2001, doi.10.1109/CVPR.2001.990517.
[4]
N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 2005, pp. 886–893.
[5]
H. Xu, Z. Cai, Z. Xiong, and W. Li, Backdoor attack on 3D grey image segmentation, in Proc. IEEE Int. Conf. Data Mining (ICDM), Shanghai, China, 2023, pp. 708–717.
[6]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779–788.
[7]

P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, A review of yolo algorithm developments, Procedia Comput. Sci., vol. 199, pp. 1066–1073, 2022.

[8]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, SSD: Single shot multibox detector, in Proc. Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 2016, pp. 21−37.
[9]
S. Zhai, D. Shang, S. Wang and S. Dong, DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion, IEEE Access, vol. 8, pp. 24344−24357, 2020.
[10]
R. Girshick, J. Donahue, T. Darrell, and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 580–587.
[11]
P. Bharati and A. Pramanik, Deep learning techniques—R-CNN to mask R-CNN: A survey, in Proc. Advances in Intelligent Systems and Computing, Singapore, 2019. pp. 657–668,
[12]
X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, Oriented R-CNN for object detection, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 3500–3509.
[13]

Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, Deep learning based inference of private information using embedded sensors in smart devices, IEEE Netw., vol. 32, no. 4, pp. 8–14, 2018.

[14]
L. Xie and A. Yuille, Genetic CNN, in Proc. IEEE Int. Conf. Computer Vision (ICCV ), Venice, Italy, 2017, pp. 1388–1397.
[15]
Z. Wang and J. Liu, A review of object detection based on convolutional neural network, in Proc. 36th Chinese Control Conf. (CCC), Dalian, China, 2017, pp. 11104–11109.
[16]
Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, Few-shot object detection with attention-RPN and multi-relation detector, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 4012–4021.
[17]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 936–944.
[18]
G. Ghiasi, T.-Y. Lin, and Q. V. Le, NAS-FPN: Learning scalable feature pyramid architecture for object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 7029–7038.
[19]
Y. Gong, X. Yu, Y. Ding, X. Peng, J. Zhao, and Z. Han, Effective fusion factor in FPN for tiny object detection, in Proc. IEEE Winter Conf. Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2021, pp. 1159–1167.
[20]

C. Spahn, E. Gómez-de-Mariscal, R. F. Laine, P. M. Pereira, L. von Chamier, M. Conduit, M. G. Pinho, G. Jacquemet, S. Holden, M. Heilemann, et al., DeepBacs for multi-task bacterial image analysis using open-source deep learning approaches, Commun. Biol., vol. 5, no. 1, p. 688, 2022.

[21]

A. Ahmed, J. V. Rushworth, N. A. Hirst, and P. A. Millner, Biosensors for whole-cell bacterial detection, Clin. Microbiol. Rev., vol. 27, no. 3, pp. 631–646, 2014.

[22]

M. Alafeef, P. Moitra, and D. Pan, Nano-enabled sensing approaches for pathogenic bacterial detection, Biosens. Bioelectron., vol. 165, p. 112276, 2020.

[23]

H. Xu, Z. Cai, and W. Li, Privacy-preserving mechanisms for multi-label image recognition, ACM Trans. Knowl. Discov. Data, vol. 16, no. 4, pp. 1–21, 2022.

[24]
C. Subakan, M. Ravanelli, S. Cornell, M. Bronzi, and J. Zhong, Attention is all you need, in Proc IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Canada, 2021, pp. 21−25.
[25]

A. Zhao, Y. Wang, and J. Li, Transferable self-supervised instance learning for sleep recognition, IEEE Trans. Multimed., vol. 25, pp. 4464–4477, 2022.

[26]

A. Zhao, H. Wu, M. Chen, and N. Wang, A spatio-temporal Siamese neural network for multimodal handwriting abnormality screening of Parkinson’s disease, Int. J. Intell. Syst., vol. 2023, p. 9921809, 2023.

[27]

K. I. Ravikumar and R. Sukumar, Memristor based object detection using neural network, High Confid. Comput., vol. 2, no. 4, p. 100085, 2022.

[28]
R. Girshick, Fast R-CNN, in Proc. of the IEEE Int. Conf. on Computer Vision, doi: 10.48550/arXiv.1504.08083.
[29]

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.

[30]
H. Jiang and E. Learned-Miller, Face detection with the faster R-CNN, in Proc. 12th IEEE Int. Conf. Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 2017, pp. 650–657.
[31]
J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517–6525.
[32]
J. Redmon and A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767, 2018.
[33]
R. Akella, S. K. Gunturi, and D. Sarkar, Enhancing power line insulator health monitoring with a hybrid generative adversarial network and YOLO3 solution, Tsinghua Science and Technology, vol. 29, no. 6, pp. 1796–1809, 2024.
[34]
C.-Y. Wang, A. Bochkovskiy, and H.-Y M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 7464–7475.
[35]

R. Pang, Y. Yang, A. Huang, Y. Liu, P. Zhang, and G. Tang, Multi-scale feature fusion model for bridge appearance defect detection, Big Data Mining and Analystic, vol. 7, no. 1, pp. 1–11, 2024.

[36]

G. Franck, The economy of attention, Am. J. Sociol., vol. 55, no. 1, pp. 8–19, 2019.

[37]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2020.
[38]
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, End-to-end object detection with transformers, in Proc. Computer Vision—ECCV 2020, Glasgow, UK, 2020. pp. 213–229,
[39]
N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, Soft-NMS—Improving object detection with one line of code, in Proc. IEEE Int. Conf. Computer Vision (ICCV),Venice, Italy, 2017, pp. 5561–5569.
[40]
S. Liu, D. Huang, and Y. Wang, Adaptive NMS: Refining pedestrian detection in a crowd, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 6452–6461.
[41]
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, S. K. Gharamaleki, B. Helfield, and H. Rivaz, Deformable DETR: Deformable transformers for end-to-end object detection, arXiv preprint arXiv:2010.04159, 2020.
[42]
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, DINO: DETR with improved DeNoising anchor boxes for end-to-end object detection, arXiv preprint arXiv:2203.03605, 2022.
[43]
D. Zheng, W. Dong, H. Hu, X. Chen, and Y. Wang, Less is more: Focus attention for efficient DETR, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Paris, France, 2023, pp. 6651–6660.
[44]
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, DETRs beat YOLOs on real-time object detection, arXiv preprint arXiv:2304.08069, 2023.
[45]
F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, DN-DETR: Accelerate DETR training by introducing query DeNoising, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 13609–13617.
[46]
S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, DAB-DETR: Dynamic anchor boxes are better queries for DETR, arXiv preprint arXiv:2201.12329, 2022.
[47]
M. Tan, R. Pang, and Q. V. Le, EfficientDet: Scalable and efficient object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10778–10787.
[48]
Z. Cai and N. Vasconcelos, Cascade R-CNN: Delving into high quality object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6154–6162.
[49]
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, Focal loss for dense object detection, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 2999–3007.
[50]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018.
[51]
G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, arXiv preprint arXiv:1503.02531, 2015.
[52]
B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, et al., Language models are few-shot learners, arXiv preprint arXiv:2005.14165, 2020.
[53]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.
[54]
Z. Huang, Y. Ben, G. Luo, P. Cheng, G. Yu, and B. Fu, Shuffle transformer: Rethinking spatial shuffle for vision transformer, arXiv preprint arXiv:2106.03650, 2021.
[55]
K. Yuan, S. Guo, Z. Liu, A. Zhou, F. Yu, and W. Wu, Incorporating convolution designs into visual transformers, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 559–568.
[56]

X. Ge, J. Yu, and F. Kong, Privacy-preserving Boolean range query with verifiability and forward security over spatio-textual data, Inf. Sci., vol. 677, p. 120929, 2024.

[57]
D. Niu, L. Xu, S. Pan, L. Xia, and Z. Li, SRR-DDI: A drug-drug interaction prediction model with substructure refined representation learning based on self-attention mechanism, Knowl. Based Syst., vol. 285, p. 111337, 2024.
[58]
S. Majchrowska, J. Pawłowski, G. Guła, T. Bonus, A. Hanas, A. Loch, A. Pawlak, J. Roszkowiak, T. Golan, and Z. Drulis-Kawa, AGAR a microbial colony dataset for deep learning detection, arXiv preprint arXiv:2108.01234, 2021.
[59]
Zhang Y, Guo Z, Wu J, and Y. Tian, Real-time vehicle detection based on improved YOLO v5, Sustainability, vol. 14, no. 19, p. 12274, 2022.
[60]
C. Li, L. Li, Y. Geng, H. Jiang, M. Cheng, B. Zhang, Z. Ke, X. Xu, and X. Chu, YOLOv6 v3.0: A full-scale reloading, arXiv preprint arXiv:2301.05586, 2023.
[61]
C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, et al., Yolov6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976, 2022.
[62]
R. V. Swaminathan, B. King, G. P. Strimel, J. Droppo, and A. Mouchtaris, CoDERT: Distilling encoder representations with co-learning for transducer-based speech recognition, arXiv preprint arXiv:2106.07734, 2021.
[63]
M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, Emerging properties in self-supervised vision transformers, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9630–9640.
[64]
S. Zhang, X. Wang, J. Wang, J. Pang, C. Lyu, W. Zhang, P. Luo, and K. Chen, Dense distinct query for end-to-end object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023, pp. 7329–7338.
Big Data Mining and Analytics
Pages 1-17
Cite this article:
Liu Z, Huang Y, Wang J, et al. Multiscale Information Fusion Based on Large Model Inspired Bacterial Detection. Big Data Mining and Analytics, 2025, 8(1): 1-17. https://doi.org/10.26599/BDMA.2024.9020078

262

Views

37

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 07 March 2024
Revised: 06 October 2024
Accepted: 16 October 2024
Published: 19 December 2024
© The author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return