Multimodal 3D Fire and Smoke Localization in Complex Scenes via YOLOv7 and PointNet++ Integration

Abstract

This study presents an edge-deployable multimodal framework for 3D localization of fire and smoke, integrating YOLOv7 (You Only Look Once version 7) detection, camera–point-cloud registration, and PointNet++ (Deep Hierarchical Feature Learning on Point Sets in a Metric Space) refinement with cross-modal attention. The framework is evaluated on a hybrid dataset composed of both simulated and real-world data, covering diverse environmental conditions including nighttime, occlusion, and high-density smoke. YOLOv7 is used to detect fire and smoke regions in RGB images, generating high-confidence bounding boxes. A multi-view depth camera captures the scene point cloud, and a camera–point cloud spatiotemporal registration algorithm maps 2D detections to 3D coordinates. PointNet++ then performs multi-level feature extraction and geometric fitting on the localized point cloud. The fusion strategy integrates cross-modal attention and a multi-task loss function to jointly optimize visual and geometric features. This end-to-end process runs on an edge computing platform, balancing real-time performance and accuracy. Experiments include ablation studies, comparative evaluations with baselines (YOLOv7, PointNet++, Mask R-CNN + PointNet), and robustness tests under varying conditions. Results show that the 3D localization error is within 0.12 m, detection accuracy reaches 94.5%, recall is 92.3%, and average processing delay is 38 ms/frame. The system was tested on an NVIDIA Jetson AGX Xavier platform. Robustness score is computed based on performance under four perturbation conditions: low light, occlusion, smoke density, and sensor noise. Each condition is scored 1–5 based on detection consistency and localization error. Final score is the average across conditions.

Authors

  • Dongliang Wang Safety and Security Office, Beijing Language and Culture University

DOI:

https://doi.org/10.31449/inf.v49i36.10614

Downloads

Published

12/20/2025

How to Cite

Wang, D. (2025). Multimodal 3D Fire and Smoke Localization in Complex Scenes via YOLOv7 and PointNet++ Integration. Informatica, 49(36). https://doi.org/10.31449/inf.v49i36.10614