DCNN-FLAME: A Dual-Supervised Style Transfer-Based Method for 3D Animated Character Expression Reconstruction
Abstract
This study aims to explore a method capable of generating high-naturalness expressions for animated characters, thereby enhancing the audience’s emotional resonance. This study proposes a 3D face reconstruction system named Deep Convolutional Neural Network-Faces Learned with an Articulated Model and Expressions (DCNN-FLAME). DCNN-FLAME consists of an identity encoder, a mapping network, a Facial Landmark Embedding Model (FLAME) geometric decoding module, and a detail reconstruction network, forming an end-to-end processing pipeline from input images to 3D meshes and appearance textures. A style transfer module is constructed based on Deep Convolutional Neural Network (DCNN). It uses a pre-trained convolutional network to achieve effective separation of content and style, providing high-level semantic constraints for texture detail modeling of animated character expressions. On this basis, a dual-branch feature supervision mechanism composed of expression classification features and facial Action Units (AU) detection features is designed. Expression classification features provide global emotional semantic constraints to ensure macro-expression consistency. AU detection features guide local muscle movements from an anatomical perspective to enhance the realism of expression details. Experiments are conducted based on the large-scale face dataset VGGFace2. Systematic comparisons are performed with four 3D face reconstruction algorithms: 3D Morphable Model Fitting (3DMM-Fitting), RingNet, Detailed Expression Capture and Animation (DECA), and Fourier Analysis Networks-3D (FAN-3D). The proposed DCNN-FLAME model achieves a mean value of 1.29 in non-metric evaluation and 1.72 in metric evaluation. Both indicators are lower than those of all baseline methods, demonstrating higher geometric reconstruction accuracy and facial alignment quality. In the overall expression restoration evaluation, the F1 score of the proposed method reaches 0.564, reflecting comprehensive advantages in complex expression modeling. When both the expression classification branch and facial AU detection branch are enabled, the expression classification accuracy rate is 0.571 and the F1 score is 0.563, which are significantly better than the configuration using only a single feature for supervision. This verifies the key role of the dual-branch feature supervision mechanism in improving the naturalness and controllability of animated character expressions. This study provides an effective technical path integrating geometric reconstruction and texture enhancement for 3D animated character expression generation, and also offers new ideas and practical basis for the field of unsupervised 3D face reconstruction.DOI:
https://doi.org/10.31449/inf.v50i5.12444Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







