Facial Expression Recognition and Generation for Virtual Characters Using an Enhanced MTCNN with HR-PCN and GCN

Fangzhou Zhou

Abstract


Facial expression recognition and virtual animation character generation are crucial for animation production and human-computer interaction, but traditional models often perform poorly in complex scenes. This paper proposes a novel expression recognition and generation framework based on an improved Multi-Task Convolutional Neural Network (MTCNN), augmented by a High-Resolution Parallel Convolutional Network (HR-PCN) and Octave Convolution (OctConv). Specifically, HR-PCN enhances multi-scale feature extraction for facial keypoint detection, while OctConv improves frequency-aware representation learning. In terms of facial expression generation, Graph Convolutional Networks (GCNs) are adopted to model the semantic relationships between facial Action Units (AUs) and further enhanced with SE-ResNet50 for better spatial attention. The proposed MTCNN model was evaluated on the AFEW and CK+ datasets, achieving 89.70% and 93.50% accuracies, surpassing MTCNN’s 78.90% and 85.30% and SSD’s 85.40% and 90.10%. RMSE was reduced to 0.1 after 30 iterations, and inference time was kept within 40 ms/frame. For expression generation, the SE-ResNet50-GCN model attained a generation accuracy of up to 93.5%, significantly outperforming ResNet50-GCN (90.8%) and GCN (80.2%). These results validate the proposed framework’s effectiveness in improving both recognition accuracy and expression realism under complex conditions.


Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v49i8.8318

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.