Scholar - SciOpen

The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.