Integration of Multiscale Fusion and Cross-scale Attention Refinement for Enhanced Target Detection Using MSFNet
Abstract
Object recognition across varying scales remains a persistent challenge in computer vision, especially in scenes with occlusion, low contrast, and diverse spatial resolutions. Conventional convolutional neural networks with fixed receptive fields often fail to capture both fine-grained details and high-level contextual cues. This study focuses on developing a scale-adaptive detection framework to overcome these limitations. The proposed MSFNet (Multiscale Fusion Network) employs a Dual-Stream Convolutional Backbone to extract low-level and high-level features in parallel. A Scale-Adaptive Feature Fusion Module (SAFFM) integrates multiscale representations through dynamic, scale-aware weighting. A Cross-Scale Attention Refinement (CSAR) module enhances discriminative features and suppresses irrelevant or redundant information. The architecture operates in an end-to-end fashion and is optimized for detection accuracy and real-time inference speed. Experimental evaluation on MS COCO 2017 and PASCAL VOC 2012 reports 47.3% AP and 81.5% mAP, respectively. Performance exceeds Faster R-CNN, YOLOv5, and RetinaNet by +3.8%, +4.5%, and +3.2% AP on the COCO benchmark. MSFNet provides a scalable, accurate, and computationally efficient approach for multiscale object recognition, enabling deployment in real-time applications such as autonomous driving, intelligent surveillance, and remote sensing.References
Liu, S., Qi, L., Qin, H., Shi, J., & Jia, J. (2021). Path Aggregation Network for Instance Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 147–162. https://doi.org/10.1109/TPAMI.2019.2917184
Yang, J., Li, C., Zhang, Z., & Wang, L. (2022). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Computer Vision and Image Understanding, 224, 103535. https://doi.org/10.1016/j.cviu.2022.103535
Zubair, A., Al Rashed, F. (2024). Deep learning algorithms for multimodal interaction using speech and motion data in virtual reality systems. PatternIQ Mining, 1(4), 52–64. https://doi.org/10.70023/sahd/241105
Nair, S., Kumar, A. (2024). Zero-shot learning algorithms for object recognition in medical and navigation applications. PatternIQ Mining, 1(4), 24–37. https://doi.org/10.70023/sahd/241103
Chen, H., Sun, J., & Wang, X. (2023). Adaptive Feature Aggregation for Multiscale Object Detection. IEEE Transactions on Multimedia, 25, 422–434. https://doi.org/10.1109/TMM.2022.3140191
Zhao, R., Li, S., & Liu, Y. (2021). Deep Multiscale Contextual Learning for Semantic Segmentation in Urban Scenes. Pattern Recognition Letters, 145, 76–83. https://doi.org/10.1016/j.patrec.2021.02.014
Liu, M., Ma, J., Zheng, Q., Liu, Y., & Shi, G. (2022). 3D object detection based on attention and multi-scale feature fusion. Sensors, 22(10), 3935.
Xu, B., Gao, B., Li, Y., & Chen, L. (2024). An improved YOLOv8-based lightweight attention mechanism for cross-scale feature fusion. Sensors, 24(4), 1238
Ding, J., Lin, G., & Lu, J. (2022). Hierarchical Feature Fusion with Deformable Convolutions for Object Detection in Aerial Images. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13. https://doi.org/10.1109/TGRS.2022.3164917
Guo, C., Fan, B., Zhang, Q., & Tai, Y. (2023). Multiscale Deformable Convolutional Network for Fine-Grained Image Classification. Neural Networks, 162, 118–128. https://doi.org/10.1016/j.neunet.2023.03.005
He, Y., Zhang, H., & Yu, L. (2021). Global Context Aware Feature Aggregation for Scale-Invariant Object Detection. Knowledge-Based Systems, 229, 107374. https://doi.org/10.1016/j.knosys.2021.107374
Xie, X., Wang, C., & Zhang, Y. (2024). Multiscale Cross-Modal Feature Fusion for Object Detection in Autonomous Vehicles. Information Fusion, 98, 102210. https://doi.org/10.1016/j.inffus.2023.102210
Tan, M., Pang, R., & Le, Q. V. (2021). EfficientDet: Scalable and Efficient Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11), 4014–4026. https://doi.org/10.1109/TPAMI.2020.2979456
Chen, Y., Zhao, X., & Jia, K. (2022). Selective Feature Fusion for Object Detection. IEEE Transactions on Image Processing, 31, 2889–2901. https://doi.org/10.1109/TIP.2022.3154976
Gao, J., Lin, Z., & Liu, J. (2023). Cross-Scale Attention for High-Resolution Object Detection in Remote Sensing Images. ISPRS Journal of Photogrammetry and Remote Sensing, 195, 345–359. https://doi.org/10.1016/j.isprsjprs.2023.01.009
Zhang, T., Li, H., & Xu, M. (2022). ScaleEqualNet: Scale-Equalizing Pyramid Convolutional Network for Object Detection. Neurocomputing, 513, 293–304. https://doi.org/10.1016/j.neucom.2022.09.014
Jiang, Y., Chen, D., & Li, S. (2023). Transformer-based Multiscale Feature Aggregation for Object Detection. Pattern Recognition, 139, 109404. https://doi.org/10.1016/j.patcog.2023.109404
Wang, R., Yang, X., & Lu, Z. (2023). Attention-Driven Multi-Resolution Feature Fusion for Aerial Object Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16, 14456–14468. https://doi.org/10.1109/JSTARS.2023.3288003
https://cocodataset.org/#format-data
https://www.kaggle.com/datasets/sovitrath/pascal-voc-07-12
DOI:
https://doi.org/10.31449/inf.v49i37.9896Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







