Adaptive Multi-Scale Image Stitching Using an Attention-Enhanced BiFPN With Contrast-Aware Optimization
Abstract
To address the challenges of low stitching accuracy and limited robustness in complex scenes, this study proposed an image stitching model based on an improved Bi-directional Feature Pyramid Network (BiFPN). The model enhances performance through three key optimizations. First, an adaptive weighting mechanism dynamically balances the global and local contributions of multi-scale features. Second, a Squeeze-and-Excitation (SE) attention mechanism strengthens feature extraction in critical stitching regions such as edges and textures. Third, a global contrast enhancement module mitigates illumination variation effects on feature matching through multi-scale histogram equalization and adaptive calibration. Experiments were conducted on two benchmark datasets: Microsoft Common Objects in Context (MS COCO) and the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI). From MS COCO, 1,500 image pairs were selected (500 with illumination variations and 500 with scale variations). From KITTI, 1,500 image pairs were selected (800 static scenes and 700 dynamic targets). Each dataset was split into training and validation sets with an 8:2 ratio. Training used a batch size of 16, 50 epochs, and an initial learning rate of 0.001 with a 50% decay every 10 epochs. Comparative methods included traditional algorithms such as Oriented FAST and Rotated BRIEF (ORB) and Scale-Invariant Feature Transform (SIFT), as well as deep learning approaches including Vision Transformer-Large/16 (ViT-L/16) and the Stitch Generative Adversarial Network. The proposed model outperformed all baselines in complex scenarios. On the MS COCO dataset with illumination variations, the mean squared error (MSE) reached 1.12×10⁻²—69.09% lower than ORB and 39.46% lower than ViT-L/16. The peak signal-to-noise ratio (PSNR) increased to 34.89 dB, improving by 5.11 dB over SIFT and 2.75 dB over other models. The structural similarity index (SSIM) reached 0.946, exceeding competing methods by 7.26%. On the KITTI dataset with dynamic targets, the feature matching accuracy reached 92.3%, a 17.95% improvement over SIFT, while the stitching time decreased to 1.78 s, 30.47% faster than other models. The model maintained high robustness under parallax and motion blur conditions, providing precise and efficient image stitching for vision-based control and automation tasks such as robotic navigation and industrial monitoring.References
Li Z, Xue T, Li J, Yang A. Application of instance segmentation algorithm incorporating attention mechanism and BiFPN for sinter ore particle size recognition. Ironmaking & Steelmaking, 2024, 03019233241266294.
Ye Y, Ren X, Zhu B, Tang T, Tan X, Gui Y, et al. An adaptive attention fusion mechanism convolutional network for object detection in remote sensing images. Remote Sensing, 2022, 14(3): 516.
Wang Y, Xu Y, Yu Z, Xie G. Color-patterned fabric defect detection based on the improved YOLOv5s model. Textile Research Journal, 2023, 93(21-22): 4792–4803.
Ganapathy S, Ajmera D. An intelligent video surveillance system for detecting the vehicles on road using refined yolov4. Computers and Electrical Engineering, 2024, 113: 109036.
Vijayakumar A, Vairavasundaram S, Koilraj J A S, Rajappa M, Kotecha K, Kulkarni A. Real-time visual intelligence for defect detection in pharmaceutical packaging. Scientific Reports, 2024, 14(1): 18811.
Xiong S, Wu X, Chen H, Qin L, Chen T, He X. Bi-directional skip connection feature pyramid network and sub-pixel convolution for high-quality object detection. Neurocomputing, 2021, 440: 185–196.
Guo S, Yao J, Wu P, Yang J, Wu W, Lin Z. Blind detection of broadband signal based on weighted bi-directional feature pyramid network. Sensors, 2023, 23(3): 1525.
Tian C, Shao F, Chai X, Jiang Q, Xu L, Ho Y S. Viewport-sphere-Branch Network for Blind Quality Assessment of stitched 360 omnidirectional images. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(6): 2546–2560.
Li G, Shi G, Jiao J. YOLOv5-KCB: A new method for individual pig detection using optimized K-means, CA attention mechanism and a bi-directional feature pyramid network. Sensors, 2023, 23(11): 5242.
Okarma K, Kopytek M. Improved combined metric for automatic quality assessment of stitched images. Applied Sciences, 2022, 12(20): 10284.
Ullah H, Afzal S, Khan I U. Perceptual quality assessment of panoramic stitched contents for immersive applications: a prospective survey. Virtual Reality & Intelligent Hardware, 2022, 4(3): 223–246.
Lin C, Pang X, Hu Y. Bio-inspired multi-level interactive contour detection network. Digital Signal Processing, 2023, 141: 104155.
Li D, Li Y, Li J, Lu G. A coarse-to-fine registration network based on affine transformation and multi-scale pyramid. Expert Systems with Applications, 2024, 237: 121587.
Xia K, Lv Z, Zhou C, Gu G, Zhao Z, Liu K, et al. Mixed receptive fields augmented YOLO with multi-path spatial pyramid pooling for steel surface defect detection. Sensors, 2023, 23(11): 5114.
Qiao Y, Liu Y, Wei Z, Wang Y, Cai Q, Zhang G, et al. Hierarchical and progressive image matting. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2): 1–23.
Azizi M M, Abhari S, Sajedi H. Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images. PLOS ONE, 2024, 19(6): e0304943.
Zhang L, Lu C, Xu H, Chen A, Li L, Zhou G. MMFNet: Forest fire smoke detection using multiscale convergence coordinated pyramid network with mixed attention and fast-robust NMS. IEEE Internet of Things Journal, 2023, 10(20): 18168–18180.
Brady D J, Hu M, Wang C, Yan X, Zhu Y, Tan Y, et al. Smart cameras. arXiv preprint arXiv:2002.04705, 2020.
Wang T, Wang H, Li N, Xian J, Zhao Z, Li, D. An end-to-end medical image segmentation model based on multi-scale feature extraction. Journal of Imaging Science & Technology, 2022, 66(4): 11.
Zhang Y, Wu J, Li Q, Zhao X, Tan M. Beyond crack: Fine-grained pavement defect segmentation using three-stream neural networks. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(9): 14820–14832.
Xi C, Zhang K, He X, Hu Y, Chen J. Soft-edge-guided significant coordinate attention network for scene text image super-resolution. The Visual Computer, 2024, 40(8): 5393–5406.
Shan C, Liu H, Yu Y. Research on improved algorithm for helmet detection based on YOLOv5. Scientific Reports, 2023, 13(1): 18056.
Raj G D, Prabadevi B. Steel strip quality assurance with yolov7-csf: a coordinate attention and siou fusion approach. IEEE Access, 2023, 11: 129493–129506.
Meng F, Liu C, Zhu Z, Zhou L. UAV target detection algorithm with improved YOLOv7. Frontiers in Computing and Intelligent Systems, 2023, 5(2): 72–75.
Zhang Q, Bao X, Sun S, Lin F. Lightweight network for small target fall detection based on feature fusion and dynamic convolution. Journal of Real-Time Image Processing, 2024, 21(1): 17.
Ye J, Yu Z, Lin J, Li H, Lin L. Vision foundation model for agricultural applications with efficient layer aggregation network. Expert Systems with Applications, 2024, 257: 124972.
Tie J, Zhu C, Zheng L, Wang H, Ruan C, Wu M, et al. LSKA-YOLOv8: A lightweight steel surface defect detection algorithm based on YOLOv8 improvement. Alexandria Engineering Journal, 2024, 109: 201–212.
Xue Q, Lin H, Wang F. Fcdm: an improved forest fire classification and detection model based on yolov5. Forests, 2022, 13(12): 2129.
Vijayakumar A, Vairavasundaram S, Koilraj J A S, Rajappa M, Kotecha K, Kulkarni A. Real-time visual intelligence for defect detection in pharmaceutical packaging. Scientific Reports, 2024, 14(1): 18811.
Abdusalomov A B, Mukhiddinov M, Whangbo T K. Brain tumor detection based on deep learning approaches and magnetic resonance imaging. Cancers, 2023, 15(16): 4172.
DOI:
https://doi.org/10.31449/inf.v50i5.11622Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







