AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (15.4 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Local Region Frequency Guided Dynamic Inconsistency Network for Deepfake Video Detection

Engineering Research Center of Digital Forensics affiliated with Ministry of Education, and also with School of Computer Science, and also with Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China
Show Author Information

Abstract

In recent years, with the rapid development of deepfake technology, a large number of deepfake videos have emerged on the Internet, which poses a huge threat to national politics, social stability, and personal privacy. Although many existing deepfake detection methods exhibit excellent performance for known manipulations, their detection capabilities are not strong when faced with unknown manipulations. Therefore, in order to obtain better generalization ability, this paper analyzes global and local inter-frame dynamic inconsistencies from the perspective of spatial and frequency domains, and proposes a Local region Frequency Guided Dynamic Inconsistency Network (LFGDIN). The network includes two parts: Global SpatioTemporal Network (GSTN) and Local Region Frequency Guided Module (LRFGM). The GSTN is responsible for capturing the dynamic information of the entire face, while the LRFGM focuses on extracting the frequency dynamic information of the eyes and mouth. The LRFGM guides the GTSN to concentrate on dynamic inconsistency in some significant local regions through local region alignment, so as to improve the model’s detection performance. Experiments on the three public datasets (FF++, DFDC, and Celeb-DF) show that compared with many recent advanced methods, the proposed method achieves better detection results when detecting deepfake videos of unknown manipulation types.

References

[1]
M. Tora, deepfakes, https://github.com/deepfakes/faceswap/tree/v2.0.0, 2018.
[2]

K. Liu, I. Perov, D. Gao, N. Chervoniy, W. Zhou, and W. Zhang, Deepfacelab: Integrated, flexible and extensible face-swapping framework, Pattern Recognition, vol. 141, p. 109628, 2023.

[3]
M. Kowalski, FaceSwap, https://github.com/marekkowalski/faceswap, 2018.
[4]

H. Lin, W. Huang, W. Luo, and W. Lu, deepfake detection with multi-scale convolution and vision transformer, Digital Signal Processing, vol. 134, p. 103895, 2023.

[5]
H. Zhao, T. Wei, W. Zhou, W. Zhang, D. Chen, and N. Yu, Multi-attentional deepfake detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Nashville, TN, USA, 2021, pp. 2185–2194.
[6]
L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, Face X-ray for more general face forgery detection, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Seattle, WA, USA, 2020, pp. 5000–5009.
[7]
J. Li, H. Xie, J. Li, Z. Wang, and Y. Zhang, Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Nashville, TN, USA, 2021, pp. 6454–6463.
[8]
Y. Qian, G. Yin, L. Sheng, Z. Chen, and J. Shao, Thinking in frequency: Face forgery detection by mining frequency-aware clues, in Proc. 16 th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 86–103.
[9]

B. Chen, X. Liu, Z. Xia, and G. Zhao, Privacy-preserving deepfake face image detection, Digital Signal Processing, vol. 143, p. 104233, 2023.

[10]
B. Liu, B. Liu, M. Ding, T. Zhu, and X. Yu, TI2Net: Temporal identity inconsistency network for deepfake detection, in Proc. 2023 IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, HI, USA, 2023, pp. 4680–4689.
[11]

R. Caldelli, L. Galteri, I. Amerini, and A. Del Bimbo, Optical flow based CNN for detection of unlearnt deepfake manipulations, Pattern Recogn. Lett., vol. 146, pp. 31–37, 2021.

[12]

M. S. Saealal, M. Z. Ibrahim, D. J. Mulvaney, M. I. Shapiai, and N. Fadilah, Using cascade CNN-LSTM-FCNs to identify AI-altered video based on eye state sequence, PLoS ONE, vol. 17, no. 12, p. e0278989, 2022.

[13]

H. Wang, Z. Liu, and S. Wang, Exploiting complementary dynamic incoherence for deepfake video detection, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 8, pp. 4027–4040, 2023.

[14]

Y. Zhu, C. Zhang, J. Gao, J. Gao, X. Sun, Z. Rui, and X. Zhou, High-compressed deepfake video detection with contrastive spatiotemporal distillation, Neurocomputing, vol. 565, p. 126872, 2024.

[15]

B. Chen, T. Li, and W. Ding, Detecting deepfake videos based on spatiotemporal attention and convolutional LSTM, Inform. Sci., vol. 601, pp. 58–70, 2022.

[16]

A. Koteswaramma, M. B. Rao, and G. J. Suma, An intelligent adaptive learning framework for fake video detection using spatiotemporal features, Signal, Image and Video Processing, vol. 18, no. 3, pp. 2231–2241, 2024.

[17]

J. Wu, Y. Zhu, X. Jiang, Y. Liu, and J. Lin, Local attention and long-distance interaction of rPPG for deepfake detection, Vis. Comput., vol. 40, no. 2, pp. 1083–1094, 2024.

[18]
X. Ding, W. Zhu, and D. Zhang, deepfake videos detection via spatiotemporal inconsistency learning and interactive fusion, in Proc. 19 th Annu. IEEE Int. Conf. Sensing, Communication, and Networking (SECON ), Stockholm, Sweden, 2022, pp. 425–433.
[19]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27 th Int. Conf. Neural Information Processing Systems, Montreal Canada, 2014, pp. 2672–2680.
[20]
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner, Face2Face: Real-time face capture and reenactment of RGB videos, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition (CVPR ), Las Vegas, NV, USA, 2016, pp. 2387–2395.
[21]

J. Thies, M. Zollhöfer, and M. Nießner, Deferred neural rendering: Image synthesis using neural textures, ACM Trans. Graph., vol. 38, no. 4, p. 66, 2019.

[22]
R. Tolosana, S. Romero-Tapiador, J. Fierrez, and R. Vera-Rodriguez, deepfakes evolution: Analysis of facial regions and fake detection performance, in Proc. Int. Conf. Pattern Recognition, Virtual Event, 2021, pp. 442–456.
[23]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929, 2021.
[24]
K. Li, Y. Wang, Y. He, Y. Li, Y. Wang, L. Wang, and Y. Qiao, UniFormerV2: Unlocking the potential of image ViTs for video understanding, in Proc. 2023 IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 1632–1643.
[25]
J. Deng, J. Guo, E. Ververas, I. Kotsia, and S. Zafeiriou, RetinaFace: Single-shot multi-level face localisation in the wild, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Seattle, WA, USA, 2020, pp. 5202–5211.
[26]

J. Fridrich and J. Kodovsky, Rich models for steganalysis of digital images, IEEE Transactions on Information Forensics and Security, vol. 7, no. 3, pp. 868–882, 2012.

[27]
Y. Luo, Y. Zhang, J. Yan, and W. Liu, Generalizing face forgery detection with high-frequency features, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Nashville, TN, USA, 2021, pp. 16312–16321.
[28]
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, CBAM: Convolutional block attention module, in Proc. 15 th European Conf. Computer Vision (ECCV ), Munich, Germany, 2018, pp. 3–19.
[29]
J. Bai, L. Yuan, S. T. Xia, S. Yan, Z. Li, and W. Liu, Improving vision transformers by revisiting high-frequency components, in Proc. 17 th European Conf. Computer Vision (ECCV ), Tel Aviv, Israel, 2022, pp. 1–18.
[30]
C. Feichtenhofer, X3D: Expanding architectures for efficient video recognition, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Seattle, WA, USA, 2020, pp. 200–210.
[31]
K. He, G. Gkioxari, P. Dollár, and R. Girshick, Mask R-CNN, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV ), Venice, Italy, 2017, pp. 2980–2988.
[32]
A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, FaceForensics: A large-scale video dataset for forgery detection in human faces, arXiv preprint arXiv:1803.09179, 2018.
[33]
Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, Celeb-DF: A large-scale challenging dataset for deepfake forensics, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Seattle, WA, USA, 2020, pp. 3204–3213.
[34]
B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. C. Ferrer, The deepfake detection challenge (DFDC) dataset, arXiv preprint arXiv:2006.07397, 2020.
[35]
K. Kim, Y. Kim, S. Cho, J. Seo, J. Nam, K. Lee, S. Kim, and K. Lee, DiffFace: Diffusion-based face swapping with facial guidance, arXiv preprint arXiv:2212.13344, 2022.
[36]
S. Zhao, Y. Rao, W. Shi, Z. Liu, J. Zhou, and J. Lu, DiffSwap: High-fidelity and controllable face swapping via 3D-aware masked diffusion, in Proc. 2023 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), Vancouver, Canada, 2023, pp. 8568–8577.
[37]

Z. Hu, H. Xie, L. Yu, X. Gao, Z. Shang, and Y. Zhang, Dynamic-aware federated learning for face forgery video detection, ACM Trans. Intell. Syst. Technol., vol. 13, no. 4, p. 57, 2022.

[38]
J. Carreira and A. Zisserman, Quo vadis, Action recognition? A new model and the kinetics dataset, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR ), Honolulu, HI, USA, 2017, pp. 4724–4733.
[39]
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in Proc. 2017 IEEE Int. Conf. Computer Vision (ICCV ), Venice, Italy, 2017, pp. 618–626.
[40]
F. Chollet, Xception: Deep learning with depthwise separable convolutions, in Proc. 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR ), Honolulu, HI, USA, 2017, pp. 1800–1807.
[41]
K. Sun, H. Liu, Q. Ye, Y. Gao, J. Liu, L. Shao, and R. Ji, Domain general face forgery detection by learning to weight, in Proc. 35 th AAAI Conf. Artificial Intelligence, Virtual Event, 2021, pp. 2638–2646.
[42]
K. Sun, T. Yao, S. Chen, S. Ding, J. Li, and R. Ji, Dual contrastive learning for general face forgery detection, in Proc. 36 th AAAI Conf. Artificial Intelligence, Virtual Event, 2022, pp. 2316–2324.
[43]
J. Cao, C. Ma, T. Yao, S. Chen, S. Ding, and X. Yang, End-to-end reconstruction-classification learning for face forgery detection, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR ), New Orleans, LA, USA, 2022, pp. 4103–4112.
[44]

F. Dong, X. Zou, J. Wang, and X. Liu, Contrastive learning-based general deepfake detection with multi-scale RGB frequency clues, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 4, pp. 90–99, 2023.

[45]
J. Zhang and J. Ni, Domain-invariant feature learning for general face forgery detection, in Proc. 2023 IEEE Int. Conf. Multimedia and Expo (ICME ), Brisbane, Australia, 2023, pp. 2321–2326.
[46]

Y. Wang, C. Peng, D. Liu, N. Wang, and X. Gao, Spatial-temporal frequency forgery clue for video forgery detection in VIS and NIR scenario, IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 12, pp. 7943–7956, 2023.

Big Data Mining and Analytics
Pages 889-904
Cite this article:
Yue P, Chen B, Fu Z. Local Region Frequency Guided Dynamic Inconsistency Network for Deepfake Video Detection. Big Data Mining and Analytics, 2024, 7(3): 889-904. https://doi.org/10.26599/BDMA.2024.9020030

200

Views

13

Downloads

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 10 January 2024
Revised: 05 April 2024
Accepted: 07 May 2024
Published: 28 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return