Application of Multimodal Generation Model in Short Video Content Personalized Generation

Minghui Yang

doi:10.31449/inf.v49i21.9838

Application of Multimodal Generation Model in Short Video Content Personalized Generation

Abstract

The rise of short video platforms has led to a higher demand for rapidly generated personalized content. Existing systems either struggle with high levels of customization or require large amounts of data, limiting real-time production. A multimodal generation model serves as the focus of study to generate customized short video content that adapts to user preferences as well as their behavioral patterns. The objective targets an integrative model using text alongside image and audio data to make context-specific short video content, which delivers personalized entertainment. First, it analyses user preferences from interaction data and then synthesizes corresponding video content using a novel method called a stochastic paint optimizer with an intelligent convolutional neural network (SPO-IntelliConvNet). The SPO component ensures optimal representation of multimodal content by improving feature selection and parameter tuning through stochastic search algorithms modelled after the dynamics of abstract paintings. The IntelliConvNet is used to combine and interpret several modalities, allowing for efficient personalization that is consistent with user preferences. To develop personalized content, user preference data is collected, which includes interactions such as video views and comments. The model employs natural language processing (NLP), audio processing, and computer vision to merge text, image, and audio modalities. Pre-processing includes tokenization for text, Canny edge detection for images, and Wiener filtering for audio, optimizing each modality for better analysis and feature extraction using principal component analysis (PCA) to reduce the dimensions of features from all three modalities to lower dimensions while preserving essential information. This proposed approach achieved superior personalized content development, leading to increased user satisfaction and engagement. The performance of the proposed method was evaluated using BLEU-4 (0.55), ROUGE-L (0.79), METEOR (0.72), and CIDEr (0.80). The system's ability to successfully incorporate multimodal data resulted in more precise video customization, as demonstrated by interaction metrics and user comments. This multimodal generation model provides an advanced solution for creating personalized short video content, increasing the user experience with highly tailored content.

Authors

Minghui Yang

DOI:

https://doi.org/10.31449/inf.v49i21.9838

Downloads

Published

12/15/2025

How to Cite

Yang, M. (2025). Application of Multimodal Generation Model in Short Video Content Personalized Generation. Informatica, 49(21). https://doi.org/10.31449/inf.v49i21.9838

Download Citation

Issue

Vol. 49 No. 21 (2025): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

Application of Multimodal Generation Model in Short Video Content Personalized Generation

Abstract

Authors

DOI:

Downloads

Published

How to Cite

Issue

Section

License

Developed By

Information