Enhancing YOLOv5 for Small Target Detection in UAV Aerial Images Via Multi-Scale Feature Fusion
Abstract
This paper briefly introduces unmanned aerial vehicle (UAV) aerial photography technology and the you only look once version 5 (YOLOv5) model for image target detection. In order to enhance the performance of the YOLOv5 model, modifications were implemented by increasing one feature fusion network layer in its neck network and an anchor box of a smaller scale in its head network. Subsequently, simulation experiments were conducted using images captured by UAVs to compare the improved model with the region-based convolutional neural network (R-CNN) and traditional YOLOv5 models. The findings indicated that the enhanced YOLOv5 model achieved faster convergence to stability during the training process. The R-CNN model required 130 iterations, the traditional YOLOv5 model required 120 iterations, and the improved YOLOv5 model only needed 80 iterations. Moreover, compared to the R-CNN and traditional YOLOv5 models, the improved YOLOv5 model demonstrated superior accuracy in identifying and localizing small targets in UAV aerial images. The improved YOLOv5 model had a precision of 0.989, a recall rate of 0.988, an F value of 0.988, and an intersection over union of 0.987 for small target recognition in images.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i26.8252

This work is licensed under a Creative Commons Attribution 3.0 License.