A Dual-Engine Embedded Face Detection and Recognition Framework Using YOLO5Face and Attention-Enhanced Faster-RCNN for Surveillance Video

Qianqian Yuan; Yuping Quan; Hui Li

doi:10.31449/inf.v50i8.10835

A Dual-Engine Embedded Face Detection and Recognition Framework Using YOLO5Face and Attention-Enhanced Faster-RCNN for Surveillance Video

Abstract

Embedded detection and recognition systems for surveillance video are in urgent demand in the security field. However, traditional methods face limitations, including poor real-time performance, high resource consumption, and limited generalization in complex scenarios. To this end, this study proposes a dual-engine embedded face detection and recognition framework that optimizes performance by synergistically integrating YOLO v5Face with attention-enhanced Faster Regions with Convolutional Neural Network. The system adopts a dual engine cascade architecture: YOLO5Face is responsible for fast initial face screening, while Faster Regions with Convolutional Neural Network, which integrates spatial and channel attention mechanisms, accurately recognizes key targets. By synergistically optimizing speed and accuracy through feature reuse and structural fusion techniques, and by combining the feature-extraction capabilities of the local binary pattern histogram algorithm based on hierarchical feature pyramids, a dynamic background suppression module is used to reduce false positives in complex scenes. The experimental results on the WIDER FACE and Face Detection Data Set and Benchmark datasets show that the accuracy of our system reaches 99.1%, with a loss rate as low as 0.08, significantly better than the comparison systems Visual Transformer Convolutional Neural Network Fusion (accuracy 98.16±0.23%) and Additive Marginal Soft Maximum Loss Convolutional Multi-scale Transformer (accuracy 97.42±0.34%); The system converges to a loss of less than 0.1 within 200 iterations, with a response time of only 28 ms, much faster than the fusion of Visual Transformer Convolutional Neural Network (78-85 ms). The above results show that the proposed method effectively addresses the problems of poor real-time performance, resource constraints, and insufficient scene generalization, offering efficient, lightweight new ideas for system development and promoting the intelligent and efficient development of security terminals.

Authors

Qianqian Yuan Jiaozuo Normal College, Jiaozuo 4540000, China
Yuping Quan Jiaozuo Normal College, Jiaozuo 4540000, China
Hui Li Jiaozuo Normal College, Jiaozuo 4540000, China

DOI:

https://doi.org/10.31449/inf.v50i8.10835

Downloads

Published

02/21/2026

How to Cite

Yuan, Q., Quan, Y., & Li, H. (2026). A Dual-Engine Embedded Face Detection and Recognition Framework Using YOLO5Face and Attention-Enhanced Faster-RCNN for Surveillance Video. Informatica, 50(8). https://doi.org/10.31449/inf.v50i8.10835

Download Citation

Issue

Vol. 50 No. 8 (2026): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

A Dual-Engine Embedded Face Detection and Recognition Framework Using YOLO5Face and Attention-Enhanced Faster-RCNN for Surveillance Video

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information