A CLIP-SAM-Based Multimodal Semantic Segmentation and Decision Framework for Intelligent Monitoring in Coal Preparation Plants

Peijun Zhang; Jianbo Li; Zhiliang Si; Meiling Huang

doi:10.31449/inf.v49i28.10288

A CLIP-SAM-Based Multimodal Semantic Segmentation and Decision Framework for Intelligent Monitoring in Coal Preparation Plants

Abstract

As a key link in clean coal processing, the intelligent upgrade of equipment status monitoring in coal preparation plants is of great significance to ensure production safety and efficiency, while the traditional monitoring system relies on a single sensor data, which has problems such as low fault identification rate and high response delay, making it difficult to cope with multi-source interference under complex working conditions. In order to solve this challenge, this study proposes an intelligent monitoring system based on the CLIP-SAM multi-modal joint analysis architecture, which constructs a cross-modal feature alignment model by combining visible light images, infrared thermal imaging and vibration spectral data, and the experimental results show that in the detection of typical faults such as belt deviation and drum fouling, the comprehensive recognition accuracy of the system is improved to 94.2%, which is 19.8% higher than that of the traditional single-mode method, and the average response time of abnormal events is shortened to 2.3 seconds, which is 98% higher than that of manual inspection. At the same time, with the help of the high-precision image segmentation ability of the SAM model, the positioning error of the coal powder coverage area on the surface of the equipment is reduced to 3.5 pixels, which effectively solves the false detection problem caused by target occlusion in industrial scenarios, and the cross-modal correlation analysis of the CLIP model enables the system to detect light sudden changes environment, which verifies the architecture's environmental adaptability.

Authors

Peijun Zhang
Jianbo Li
Zhiliang Si
Meiling Huang Anhui Hengtai Electric Technology Co., Ltd.

DOI:

https://doi.org/10.31449/inf.v49i28.10288

Downloads

Published

12/21/2025

How to Cite

Zhang, P., Li, J., Si, Z., & Huang, M. (2025). A CLIP-SAM-Based Multimodal Semantic Segmentation and Decision Framework for Intelligent Monitoring in Coal Preparation Plants. Informatica, 49(28). https://doi.org/10.31449/inf.v49i28.10288

Download Citation

Issue

Vol. 49 No. 28 (2025): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

A CLIP-SAM-Based Multimodal Semantic Segmentation and Decision Framework for Intelligent Monitoring in Coal Preparation Plants

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information