MTCNN-UGAN: A Self-Attention Enhanced Face Replacement Pipeline for Film and Television Video Frames

Yumeng Wu; Xiao Wang

doi:10.31449/inf.v49i5.8927

MTCNN-UGAN: A Self-Attention Enhanced Face Replacement Pipeline for Film and Television Video Frames

Yumeng Wu, Xiao Wang

Abstract

At present, face replacement technology in film and television videos faces problems such as low accuracy and high resource consumption. Research proposes an automated face replacement technology that integrates improved multi task cascaded convolutional neural networks (MTCNN) and generative adversarial networks (GAN). In the face detection stage, MTCNN is used, and a median filter preprocessing and depthwise separable convolution model are introduced. In the face replacement stage, a U-Net based generative adversarial network (UGAN) is constructed, whose generator consists of an encoder and a decoder, and is embedded with a dual skip connection residual module. The discriminator adopts a self attention mechanism and a video stabilization module. In the experiment, WIDER FACE and Celeb Faces Attributes Dataset (CelebA) were used for face detection tasks. The face replacement task used a high-resolution Celebrity Mask High Quality (CelebAMask HQ) dataset and a Deepfake Model Attribution Dataset (FDM). Meanwhile, the study introduced FaceSwap technology and attribute preserving generative adversarial network (AP-GAN) as comparative baselines. In face detection experiments, the research model performed best in terms of accuracy as well as training loss in different face detection scenes. For example, the accuracy of the research model in complex scenes was 93.25%, and the training loss was 0.221. In the face replacement experiment, the model replaces faces in four image sets. Its color as well as face contour structure was well preserved and face replacement was more natural. In the similarity index comparison, the research model performed the highest face replacement similarity index at different frame numbers with an average value of 0.994. The research model also performed the best in the face replacement imaging peak signal-to-noise ratio test with an average value of 35.65. Finally, in the face replacement composite test, the research model performed the best in both structural similarity and state error. In conclusion, the technique has good application results. This study can provide technical support for the improvement of face replacement technology as well as face characterization.

Full Text:

PDF

References

Zeng H, Zhang W, Fan C, Lv T, Wang S. FlowFace: semantic flow-guided shape-aware face-swapping[C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(3): 3367-3375.

Rehaan M, Kaur N, Kingra S. Face manipulated deepfake generation and recognition approaches: A survey. Smart Science, 2024, 12(1): 53-73.

Kaddar B, Fezza S A, Hamidouche W. On the effectiveness of handcrafted features for deepfake video detection. Journal of Electronic Imaging, 2023, 32(5): 53033-53033.

Rao I S S, Kumar J S, Vamsi T M N, Kumar TR. IMPROVING REALISM IN face-swapping USING DEEP LEARNING AND K-MEANS CLUSTERING. Proceedings on Engineering, 2024, 6(4): 1751-1756.

Omar K, Sakr R H, Alrahmawy M F. An ensemble of CNNs with self-attention mechanism for DeepFake video detection. Neural Computing and Applications, 2024, 36(6): 2749-2765.

Bansal N, Aljrees T, Yadav D P, Singh KU, Kumar A. Real-time advanced computational intelligence for deep fake video detection. Applied Sciences, 2023, 13(5): 3095-3098.

Tsai C S, Wu H C, Chen W T, Ying JJC. ATFS: A deep learning framework for angle transformation and face-swapping of face de-identification. Multimedia Tools and Applications, 2024, 83(12): 36797-36822.

Ikram S T, Chambial S, Sood D. A performance enhancement of deepfake video detection through the use of a hybrid CNN Deep learning model. International journal of electrical and computer engineering systems, 2023, 14(2): 169-178.

Liao X, Wang Y, Wang T, Hu J, Wu X. FAMM: Facial muscle motions for detecting compressed deepfake videos over social networks. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7236-7251.

Akhtar Z, Pendyala T L, Athmakuri V S. Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve. Forensic Sciences, 2024, 4(3): 289-377.

Zhao H, Zhou W, Chen D, Zhang W, Guo Y. Audio-Visual Contrastive Pre-train for Face Forgery Detection. ACM Transactions on Multimedia Computing, Communications and Applications, 2024, 21(2): 1-16.

Yue P, Chen B, Fu Z. Local region frequency guided dynamic inconsistency network for deepfake video detection. Big Data Mining and Analytics, 2024, 7(3): 889-904.

Wöhler L, Ikehata S, Aizawa K. Investigating the Perception of Facial Anonymization Techniques in 360° Videos. ACM Transactions on Applied Perception, 2024, 21(4): 1-17.

Salini Y, HariKiran J. Deepfake videos detection using crowd computing. International Journal of Information Technology, 2024, 16(7): 4547-4564.

Yang G, Xu K, Fang X, Zhang J. Video face forgery detection via facial motion-assisted capturing dense optical flow truncation. The Visual Computer, 2023, 39(11): 5589-5608.

Pang G, Zhang B, Teng Z, Qi Z. MRE-Net: Multi-rate excitation network for deepfake video detection. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3663-3676.

Deng L, Wang J, Liu Z. Cascaded network based on EfficientNet and transformer for deepfake video detection. Neural Processing Letters, 2023, 55(6): 7057-7076.

Tyagi S, Yadav D. A detailed analysis of image and video forgery detection techniques. The Visual Computer, 2023, 39(3): 813-833.

Lu T, Bao Y, Li L. Deepfake Video Detection Based on Improved CapsNet and Temporal–Spatial Features. Computers, Materials and Continua, 2023, 75(1): 715-740.

Glaser M, Reisinger H, Florack A. You are my friend, but we are from different worlds: Actor-type effects on audience engagement in narrative video advertisements. Journal of Advertising, 2024, 53(4): 568-587.

DOI: https://doi.org/10.31449/inf.v49i5.8927

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me