Improved YOLOv8s with Swin Transformer and Depthwise Convolutions for Small-Target Pepper Detection and Localization in Agricultural Robotics
Abstract
A recognition and localization system for chili picking robots was developed based on an improved YOLOv8s model and a RealSense depth camera. The proposed model integrates the Swin Transformer, DW Conv, and C2 modules into the YOLOv8s framework to enhance small-target detection and reduce computational complexity. A dataset containing 2,000 field images of Chaotian pepper (Capsicum frutescens L.) was collected under varying lighting and occlusion conditions, and divided into training, validation, and test sets (7:2:1). To validate the effectiveness of the proposed approach, comparative experiments were conducted against YOLOv5, YOLOv6, YOLOv7, and the original YOLOv8s models. Ablation studies demonstrated that each added component improved model performance, with the combined integration achieving the best results. The improved YOLOv8s model reached a mean Average Precision (mAP) of 82.7%, Recall (R) of 93.0%, and Precision (P) of 79.0%, representing respective increases of 3.4%, 3.0%, and 5.7% compared with the baseline YOLOv8s. These results confirm that the improved YOLOv8s model achieves accurate and efficient chili recognition and localization suitable for robotic harvesting applications.References
Zou Xuexiao, Ma YQ, Dai XZ, et al. Pepper dissemination and industrial development in China [J]. Acta Horticultura Sinica, 2020, 47 (09):1715-1726.
Zou Xuexiao, Zhu Fan. Origin, evolution and cultivation history of pepper [J]. Acta Horticultura Sinica, 2022, 49 (06):1371-1381.
Zou Xuexiao, Hu Bo Wen, Xiong Cheng, et al. Review and prospect of pepper breeding in China in the past 60 years [J]. Acta Horticultura Sinica, 2022, 49 (10):2099-2118.
Saddik A, Latif R, Taher F, El Ouardi A, Elhoseny M. Mapping agricultural soil in greenhouse using an autonomous low-cost robot and precise monitoring. Sustainability. 2022 Dec;14(23):15539. doi:10.3390/su142315539.
Zuo MHQ, Zhao YH, Yu SS. Industrial robot applications and individual migration decision: evidence from households in China. Humanities & Social Sciences Communications. 2024 Aug 9;11(1):1022. doi:10.1057/s41599-024-03542-z.
Yu KZ, Shi Y, Feng JH. The influence of robot applications on rural labor transfer. Humanities & Social Sciences Communications. 2024 Jun 20;11(1):796. doi:10.1057/s41599-024-03333-6.
Aivazidou E, Tsolakis N. Transitioning towards human-robot synergy in agriculture: a systems thinking perspective. Systems Research and Behavioral Science. 2023 May;40(3):536–551. doi:10.1002/sres.2887.
Adamides G, Katsanos C, Parmet Y, Christou G, Xenos M, Hadzilacos T, Edan Y. HRI usability evaluation of interaction modes for a teleoperated agricultural robotic sprayer. Applied Ergonomics. 2017 Jul;62:237–246. doi:10.1016/j.apergo.2017.03.008.
Liu Sixing, Li Shuang, Miao Hong, et al. Research on identification and localization of pepper picking robot based on YOLOv3 in different scenes [J]. Agricultural Mechanization Research, 2024, 46 (02):38-43.
Wei Tianyu, Liu Tianhong, Zhang Shanwen, et al. Identification and localization method of pepper picking robot based on improved YOLOv5s [J]. Journal of Yangzhou University (Natural Science Edition), 2023, 26 (01):61-69.
Chen Dexin. Fruit recognition and location of bell pepper based on binocular vision [D]. Henan Agricultural University, 2023.
Wang Long. Semantic segmentation algorithm based on convolutional neural network and its application in sweet pepper image recognition [D]. Jiangsu University, 2022.
Li Lian, Ding Wenkuan. Pepper recognition based on convolutional neural network [J]. Journal of Tianjin University of Technology, 2017, 33 (03):12-15.
Zhong Shihao. Research on clustering pepper target recognition and localization algorithm based on deep learning [D]. Guizhou Normal University, 2024.
Huang Huacheng. Study on maturity and damage identification of fresh pepper based on hyperspectral technology [D]. Guizhou University, 2023.
Terven J, Córdova-Esparza D M, Romero-González J A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS[J]. Machine Learning and Knowledge Extraction, 2023, 5(4): 1680-1716.
Guo, Yunhui, et al. "Depthwise convolution is all you need for learning multiple visual domains. " Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. No. 01. 2019.
Chollet, François. "Xception:Deep learning with depthwise separable convolutions. " Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows. " Proceedings of the IEEE/CVF international conference on computer vision. 2021.
Faxon HO. Small farmers, big tech: agrarian commerce and knowledge on Myanmar Facebook. Agriculture and Human Values. 2023 Sep;40(3):897–911. doi:10.1007/s10460-023-10446-2.
DOI:
https://doi.org/10.31449/inf.v50i5.11556Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







