Dynamic Evaluation Architecture for Long Text Generation Using Enhanced Transformer-Xl and Adversarial Feedback Mechanisms

Xiping Liu, Leyang Zhang

Abstract


As natural language generation technology expands to long-text scenarios, existing models face significant semantic coherence and contextual consistency challenges. The fixed-length attention mechanism limits traditional generative models and struggles to effectively capture long-distance dependencies, leading to logical breaks or duplicate redundancy in the generated text. To address this issue, this study proposes a dynamic evaluation framework that integrates Transformer-XL and confrontation training. By integrating the recurrent memory mechanism and real-time discriminant feedback, a closed-loop system for generation and evaluation collaborative optimization is constructed. At the model architecture level, Transformer-XL's segmented loop mechanism is used to extend the context window to 4096 characters. Compared with the 512-character limit of the traditional Transformer model, the remote dependency modeling ability is improved by 3.8 times, and it realizes the transfer and reuse of hidden states between different text fragments with the help of cyclic memory mechanism and relative position encoding, solving the problem of context fragmentation. At the same time, a dual-channel confrontation training strategy is designed, and the generator generates text segment by paragraph based on dynamic memory units. The discriminator calculates semantic consistency scores and logical conflict probabilities in real-time through a multi-granularity evaluation module to trigger adversarial loss backpropagation per 256 characters generated.In terms of method details, data preprocessing is carried out on a dataset containing 50,000 novel chapters and 24,000 academic abstracts: novel chapters are labeled with NLTK segments and spaCy entities, and are divided into 512 length sequences by paragraph and contextual association is preserved. The scholarly abstract uses BPE word segmentation to extract structured elements such as research questions. The training is conducted in a mixed batch (the ratio of new chapter to summary data is 7:3), and the training set, validation set, and test set are divided into 8:1:1. Key hyperparameters include Transformer-XL's 12-layer encoder, 768-dimensional model dimension, 12-head attention, and 1024 memory cache size; confrontation training perturbation amplitude 0.001, discriminator is a 3-layer CNN architecture (convolutional kernel size 3/5/7), etc. The experiment uses the above dataset, and the results show that the text coherence index (semantic similarity based on BERT) of the framework reaches 89.7%, which is 12.3 percentage points higher than the benchmark model. Logical coherence was evaluated using logical error rate, which was manually evaluated to show a logical error rate drop from 17.6% to 6.9% and 89.5% plot consistency when generating text longer than 2,000 characters. In addition, the dynamic intervention of adversarial discriminators effectively inhibited 38.4% of semantic offset events during model generation, and the terminology accuracy reached 82.9% in scientific literature generation tasks, which was 28.7% higher than that of traditional confrontation training methods. Ablation experiments show that removing the dynamic evaluation module reduces the local-global consistency score of the generated long text by 19.3%, validating the real-time evaluation mechanism for long text

Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v49i27.10059

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.