ImpGPT: Adaptive Patch-Based GPT-2 Transformer for Multivariate Time Series Imputation
Abstract
Missing value imputation for multivariate time series data is a critical issue in time series analysis, with wide applications in industrial monitoring, sensor data recovery, and intelligent transportation systems. Traditional imputation methods often perform poorly under high missing rates, struggling to recover lost data trends. This paper proposes an adaptive Patch partitioning-based multivariate time series missing value imputation model—ImpGPT. This model combines an adaptive Patch partitioning mechanism and a Box encoding module. Through the adaptive Patch partitioning mechanism, time series data are dynamically divided into multiple Patches to capture local feature changes in the sequence. The size and position of Patch partitioning are dynamically generated and adjusted in real time by a lightweight network, which avoids the loss of temporal information that may be caused by traditional fixed partitioning methods.The Box encoding module encodes the geometric information and missing features of each Patch into structural vectors, explicitly associating missing patterns with temporal structures and enhancing the model's sensitivity to local structural changes.ImpGPT adopts a frozen and fine-tuned GPT-2 generative Transformer model to encode and model Patch sequences. This retains the strong sequence representation capability of the original GPT-2 and further optimizes the model to adapt to specific tasks. Combined with a masked normalization mechanism, it ensures accurate imputation of missing data.To verify the effectiveness of the model, we selected two datasets: one is a real flight sensor record from a certain aviation system, which includes parameters such as pitch angle, roll angle, heading angle and their corresponding angular velocities; the other is the public Electricity dataset.Experimental results show that under various missing rates, ImpGPT significantly outperforms existing benchmark methods in metrics including Mean Squared Error (MSE), Mean Absolute Error (MAE), Symmetric Mean Absolute Percentage Error (SMAPE), and Normalized Root Mean Squared Error (NRMSE). Especially under the high missing rate of 75%:In the aviation sensor dataset, the MSE of ImpGPT is 0.0346, which is 1.7% lower than that of PatchTST (0.0352) and 23.1% lower than that of GPT4TS (0.045).In the Electricity dataset, the MSE of ImpGPT is 0.1253, which is 16.3% lower than that of PatchTST (0.1498) and 7.6% lower than that of GPT4TS (0.1356).These results indicate that ImpGPT still maintains good recovery accuracy even in extreme missing scenarios. Ablation experiments further verify that the adaptive Patch partitioning mechanism and model structure play key roles in improving imputation accuracy. ImpGPT performs excellently in handling high-missing-rate multivariate time series imputation tasks, with strong robustness and broad application potential.DOI:
https://doi.org/10.31449/inf.v49i24.11157Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







