AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (3.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
School of Information Engineering, Huzhou University, Huzhou 313000, China
Hebei Key Laboratory of Machine Learning and Computational Intelligence, College of Mathematics and Information Science, Hebei University, Baoding 071002, China
Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China, and also with National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.

References

[1]

X. Zhou, X. Yang, J. Ma, and K. I.-K. Wang, Energy-efficient smart routing based on link correlation mining for wireless edge computing in IoT, IEEE Internet Things J., vol. 9, no. 16, pp. 14988–14997, 2022.

[2]

X. Xu, Z. Fang, L. Qi, X. Zhang, Q. He, and X. Zhou, TripRes: Traffic flow prediction driven resource reservation for multimedia IoV with edge computing, ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 2, pp. 1–21, 2021.

[3]

S. Liang, H. Wu, L. Zhen, Q. Hua, S. Garg, G. Kaddoum, M. M. Hassan, and K. Yu, Edge YOLO: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 12, pp. 25345–25360, 2022.

[4]

X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, Deep-learning-enhanced multitarget detection for end-edge-cloud surveillance in smart IoT, IEEE Internet Things J., vol. 8, no. 16, pp. 12588–12596, 2021.

[5]

M. Chiang and T. Zhang, Fog and IoT: An overview of research opportunities, IEEE Internet Things J., vol. 3, no. 6, pp. 854–864, 2016.

[6]

P. Zhang, N. Chen, S. Shen, S. Yu, N. Kumar, and C.-H. Hsu, AI-enabled space-air-ground integrated networks: Management and optimization, IEEE Netw., vol. 38, no. 2, pp. 186–192, 2024.

[7]

L. Qi, C. Hu, X. Zhang, M. R. Khosravi, S. Sharma, S. Pang, and T. Wang, Privacy-aware data fusion and prediction with spatial-temporal context for smart city industrial environment, IEEE Trans. Ind. Inf., vol. 17, no. 6, pp. 4159–4167, 2021.

[8]
R. Girshick, Fast R-CNN, in Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, 2015, pp. 1440–1448.
[9]

S. Ren, K. He, R. Girshick, and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.

[10]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2980–2988.
[11]
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, You only look once: Unified, real-time object detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 779–788.
[12]
J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6517–6525.
[13]
J. Redmon and A. Farhadi, YOLOv3: An incremental improvement, arXiv preprint arXiv: 1804.02767, 2018.
[14]
A. Bochkovskiy, C.-Y. Wang, and H.-Y M. Liao, YOLOv4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934, 2020.
[15]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M.Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16×16 words: Transformers for image recognition at scale, in Proc. Int. Conf. on Learning Representations, Virtual Event, 2020, doi: 10.48550/arXiv.2010.11929.
[16]
R. Huang, W. Zhang, A. Kundu, C. Pantofaru, D. A. Ross, T. Funkhouser, and A. Fathi, An LSTM approach to temporal 3d object detection in lidar point clouds, in Proc. European Conference on Computer Vision, Glasgow, UK, 2020, pp. 266–282.
[17]
Y. Xu, Z. Feng, X. Xue, S. Chen, H. Wu, X. Zhou, M. Xing, and H. Chen, MemTrust: Find deep trust in your mind, in Proc. IEEE Int. Conf. Web Services (ICWS), Chicago, IL, USA, 2021, pp. 598–607.
[18]

Y. Xu, Z. Feng, X. Zhou, M. Xing, H. Wu, X. Xue, S. Chen, C. Wang, and L. Qi, Attention-based neural networks for trust evaluation in online social networks, Inf. Sci., vol. 630, pp. 507–522, 2023.

[19]
G. Jocher, A. Chaurasia, and A. Stoken, Github: Yolov5, https://github.com/ultralytics/yolov5, 2022.
[20]

X. Zhou, X. Xu, W. Liang, Z. Zeng, and Z. Yan, Deep-learning-enhanced multitarget detection for end-edge-cloud surveillance in smart IoT, IEEE Internet Things J., vol. 8, no. 16, pp. 12588–12596, 2021.

[21]

G. Wu, Z. Xu, H. Zhang, S. Shen, and S. Yu, Multi-agent DRL for joint completion delay and energy consumption with queuing theory in MEC-based IIoT, J. Parallel Distrib. Comput., vol. 176, pp. 80–94, 2023.

[22]

Y. Jia, B. Liu, W. Dou, X. Xu, X. Zhou, L. Qi, and Z. Yan, CroApp: A CNN-based resource optimization approach in edge computing environment, IEEE Trans. Ind. Inf., vol. 18, no. 9, pp. 6300–6307, 2022.

[23]

C. Wang, C. Jiang, J. Wang, S. Shen, S. Guo, and P. Zhang, Blockchain-aided network resource orchestration in intelligent Internet of Things, IEEE Internet Things J., vol. 10, no. 7, pp. 6151–6163, 2023.

[24]

G. Wu, H. Wang, H. Zhang, Y. Zhao, S. Yu, and S. Shen, Computation offloading method using stochastic games for software-defined-network-based multiagent mobile edge computing, IEEE Internet Things J., vol. 10, no. 20, pp. 17620–17634, 2023.

[25]

C. Yang, X. Xu, X. Zhou, and L. Qi, Deep Q network-driven task offloading for efficient multimedia data analysis in edge computing-assisted IoV, ACM Trans. Multimedia Comput. Commun. Appl., vol. 18, no. 2s, pp. 1–24, 2022.

[26]

U. Mittal, P. Chawla, and R. Tiwari, Ensemble net: A hybrid approach for vehicle detection and estimation of traffic density based on faster R-CNN and YOLO models, Neural Comput. Appl., vol. 35, no. 6, pp. 4755–4774, 2023.

[27]

P. Hurtik, V. Molek, J. Hula, M. Vajgl, P. Vlasanek, and T. Nejezchleba, Poly-YOLO: Higher speed, more precise detection and instance segmentation for YOLOv3, Neural Comput. Appl., vol. 34, no. 10, pp. 8275–8290, 2022.

[28]

M. Ş. Gündüz and G. Işık, A new YOLO-based method for social distancing from real-time videos, Neural Comput. Appl., vol. 35, no. 21, pp. 15261–15271, 2023.

[29]

X. Dong, S. Yan, and C. Duan, A lightweight vehicles detection network model based on YOLOv5, Eng. Appl. Artif. Intell., vol. 113, p. 104914, 2022.

[30]

J. Xu, Y. Zou, Y. Tan, and Z. Yu, Chip pad inspection method based on an improved YOLOv5 algorithm, Sensors, vol. 22, no. 17, p. 6685, 2022.

[31]

Z. Qu, L.-Y. Gao, S.-Y. Wang, H.-N. Yin, and T.-M. Yi, An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network, Image Vis. Comput., vol. 125, p. 104518, 2022.

[32]
T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, Feature pyramid networks for object detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 936–944.
[33]
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, and J. Heaton, MobileNets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.
[34]
J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, Deformable convolutional networks, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 764–773.
[35]
Z. Jin, D. Yu, L. Song, Z. Yuan, and L. Yu, You should look at all objects, in Proc. Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, 2022, pp. 332–349.
[36]
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, Path aggregation network for instance segmentation, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 8759–8768.
[37]
R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang, and T.-Y. Liu, On layer normalization in the transformer architecture, in Proc. Int. Conf. on Machine Learning, doi: 10.48550/arXiv.2002.04745.
[38]
I. Loshchilov and F. Hutter, SGDR: Stochastic gradient descent with warm restarts, arXiv preprint arXiv: 1608.03983, 2016.
[39]
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, Mixup: Beyond empirical risk management, in Proc. Int. Conf. on Learning Representations, doi: 10.48550/arXiv.1710.09412.
[40]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. E. Reed, C.-Y. Fu, and A. Berg, SSD: Single shot multibox detector, in Proc. Computer Vision—ECCV 2016: 14th Euro. Conf., Amsterdam, The Netherlands, 2016, pp. 21–37, 2016.
[41]
M. Tan, R. Pang, and Q. V. Le, EfficientDet: Scalable and efficient object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020, pp. 10778–10787.
[42]
P. Adarsh, P. Rathi, and M. Kumar, YOLO v3-Tiny: Object Detection and Recognition using one stage improved model, in Proc. 6th Int. Conf. Advanced Computing and Communication Systems, Coimbatore, India, 2020, pp. 687–694.
[43]
C.-Y. Wang, A. Bochkovskiy, and H.-Y M. Liao, Scaled-YOLOv4: Scaling cross stage partial network, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021, pp. 13024–13033.
[44]
S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, and Y. Du, PP-YOLOE: An evolved version of YOLO, arXiv preprint arXiv: 2203.16250, 2022.
[45]
C.-Y. Wang, A. Bochkovskiy, and H.-Y M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 7464–7475.
Tsinghua Science and Technology
Pages 2097-2113
Cite this article:
Zhang J, Xu C, Shen S, et al. MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion. Tsinghua Science and Technology, 2025, 30(5): 2097-2113. https://doi.org/10.26599/TST.2024.9010097

504

Views

82

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 25 May 2023
Revised: 21 August 2023
Accepted: 11 September 2023
Published: 29 April 2025
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return